[mpiwg-tools] PMPI for a complete MPI wrapper

Thu Oct 12 16:01:05 CDT 2017

On Wed, Oct 11, 2017 at 7:57 AM, Jean-Baptiste BESNARD <
jbbesnard at paratools.fr> wrote:

> Using the request Fortran ID is quite original indeed, I did not think of
> that !
> I’m usure of why OpenMPI does yield ‘1’ maybe because it directly buffered
> data and allows to free the request immediately ? Optimization which is not
> possible with MPI_Issend.
> You may try with larger messages it could give you the request handle.
>

I wanted to confirm this. As the only valid fields in the status for isend
are request completes or cancelled, if the user buffer can safely be
modified it is legal to return a predefined status that correspond to a
request that is completed (too late for cancel at this point). Thus, Open
MPI returns an internal requests that match the empty status. This explains
why you consistently get the f2c index 1.

  George.

>
> You may also hash the request to get an ID ?
>
> To summarize, it appears that the only possibility is complete MPI
> wrapping (functions where requests are involved) with custom request but
> this requires you to recompile the target program with your own request
> type.
>
> We have AFAIK the following alternatives for request wrapping:
>
> - As Extended Generalized Requests are not standard (only saves
> Wait*/Test* wrapping, avoids custom requests)
> - As Generalized Requests are not practical on the polling side (you need
> to spawn a thread or populate a list to progress MPI handles (means
> THREAD_MULTIPLE) )  (only saves Wait*/Test* wrapping, avoids custom
> requests)
> - Using C2F functions / Request hash. However, you observed that sometimes
> MPI functions did return (completed?) requests preventing match between
> Isend and Wait for example (Needs complete wrapping , avoids custom
> requests)
> - Could you use MPI-T variables bound to MPI_Request objects ? I don’t
> think runtimes propose such variables yet (Needs complete wrapping, avoids
> custom request)
> - Complete wrapping and custom MPI_Request object (should work, need
> recompilation)
>
> Others may have good solutions I missed but for me at this point only the
> last model seems reliable enough. Recompilation could be avoided with a
> reliable way for « tagging » a standard request.
>
> Cheers,
>
> Jean-Baptiste.
>
> Le 10 oct. 2017 à 12:27, Max Sagebaum <max.sagebaum at scicomp.uni-kl.de> a
> écrit :
>
> Hello Jean-Baptiste,
>
> sorry I forgot to answer you yesterday. I had a look at the example
> project and it looks ok. My problem is, that the wrapper will be a general
> one released as open source. Therefore I can not use techniques that are
> not part of the MPI library. It is even problematic if the basic technique
> that I use in the wrapper is not supported by lower standards. So
> unfortunately I can not use these extended generic requests.
>
> While writing this message I got another idea how to associate data with a
> Request. The function MPI_Request_c2f convertes a c handle into an integer
> and needs to be in some way unique, such that the implementation can
> convert it back. I could then use this index to store the additional
> information in an array or a hashmap.
>
> I tested this with the openmpi implemenation on my local machine and this
> looked promising. The only problem was, that the handle from the MPI_Isend
> was always 1 and therefore I could not associate the data. Since Send and
> Isend are unspecified how implementation handles these, I tried switching
> to Issend and here it worked.
>
> Could this be a possible avenue to associate some data with a request
> object?
>
> Cheers
>
> Max
>
> On Fri, 2017-10-06 at 17:34 +0200, Jean-Baptiste BESNARD wrote:
>
> Hi Max,
>
> In my idea you would have used *Extended* Generic Requests which do
> provide a « poll_fn », this function voids the use of progress threads
> which have indeed a lot of drawbacks.
> In practice the MPI runtime will call the « poll_fn » when progressing the
> request (only in Wait & Test). This makes the use of this request
> abstraction much simpler.
>
> I’ve taken a few minutes to make a (quick) example of their usage for
> request wrapping as they are mostly used inside ROMIO for now I think:
> https://github.com/besnardjb/egreq_example
>
> As *extended *generic requests are not standard you cannot be sure of
> finding them in all MPI implementations, here are those I know of:
>
> - MPICH has them (I checked in 3.2 and probably OK in all of its
> derivatives)
> - MPC has them since 2.5.2 (as I implemented them :-))
> - I did not find them in OpenMPI
>
> Even between these two implementations you'll find differences, I based
> myself on the aforementioned paper whereas, for example, MPICH does take
> the *wait_fn* as additional parameter of the _start.
>
> Cheers,
>
> Jean-Baptiste.
>
> Le 6 oct. 2017 à 12:06, Max Sagebaum <max.sagebaum at scicomp.uni-kl.de> a
> écrit :
>
> Hello Jean-Baptiste,
>
> thanks for the pointer to the general requests. I did not yet had a look
> at them.
>
> I can carry my additional data in this data structure. But I am not quite
> sure about the overhead. In order to implement this, I need to tell mpi
> with grequest_complete, that the request is completed. Since I want to wrap
> the usual asynchronous requests this would yield the following layout:
>
> 1. Start grequest
> 2. start thread that executes the grequest
> 3. In the thread:
> I. start the regular request
> II. Wait until the regular request is finished
> III. Signal mpi that the grequest is complete
> 4. User calls Wait/Test etc. on the grequest
> 5. the free call performs the postprocessing (Needs to be performed in the
> main thread, otherwise race conditions need to be handled)(this might also
> be possible in the extra thread, but I need to look my main data structure,
> which introduces overheads on the whole application)
>
> With this implementation I would create a thread for each asynchronous MPI
> call. The threads would be idle. Can this have an impact on overall
> performance?
> I could imagine, that it is also possible to have a busy thread, that
> tests for all wrapped asynchronous requests. But the busy thread could slow
> down the performance of a cluster node quite strongly.
>
> Whats you opinion on this matter?
>
> Cheers
>
> Max
>
> On Thu, 2017-10-05 at 15:46 +0200, Jean-Baptiste BESNARD wrote:
>
> Hi Max,
>
> Thank you very much for these details, I must admit that I need a bit more
> time to fully understand the scenario :)
>
> However, considering that you want to « wrap » requests could the
> Generalized Request interface 12.2 in the standard (or the extended one
> which are also widespread because of their use by ROMIO) be of any use to
> create requests objects which are then pointing to your own requests
> through the extra state parameter ?
>
> For extended generalized requests see : http://www.mcs.anl.gov/
> uploads/cels/papers/P1417.pdf
>
> Thanks,
>
> Jean-Baptiste.
>
> Le 5 oct. 2017 à 15:25, Max Sagebaum <max.sagebaum at scicomp.uni-kl.de> a
> écrit :
>
> Hi @ all,
>
> thanks for all the input. From what I gather from the discussion, a
> 'classic' wrapper - as mentioned by Marc (wrap functions only, leave types
> intact) - is no problem to generate. I agree on that.
> For a complete Wrapper (wrap functions and redefine types) a new ABI needs
> to be defined.
>
> If I aim for the new ABI I will have look at the wi4mpi project since they
> have already done this. I could link to there "interface" mode. I will post
> a message to Marc and Marc-Andre and the wi4mpi list if I have any problems
> here.
>
> But before I would like to tell you a little bit more about the why as
> Marc-Andre, has rightfully ask for.
>
> We are doing Algorithmic Differentiation which has three consequences for
> MPI communication:
> - We need to store data for each MPI communication such as MPI_Send, Recv
> , etc.
> - Buffers need to be pre- and postprocessed
> - For each MPI communication there is a reverse communication.
>
> The pre- and postprocessing part is the problematic bit. We need to do it,
> since we are using new structures to represent the floating point types.
> This can be for example:
> struct AReal {
> double value;
> int index;
> };
>
> In this example the prostprocessing requires the index to be adapted to
> the new machine, since the index is kind of a pointer for AD. So after the
> buffer is received the index of all AReal types needs to be renewed.
> If I do this for a MPI_Recv, there are no problems since I can do
> everything inside of the routine.
> If a MPI_Irecv is called, I can only modify the buffer after the Request
> is finished (e.g. in the Wait call). My design is now to define a new
> request:
> struct AMPI_Request {
> void* data;
> Func func;
> MPI_Request request;
> }
>
> My implementation of wait would then be,
> int AMPI_Wait(AMPI_Request* request) {
> int r = PMPI_Wait(request->request);
>
> request->func(request->data); // perform the post processing
> }
>
> Because of the structre AMPI_Request I need to include mpi.h to have the
> original MPI_Request available and I need to modify all function where
> MPI_Request is used.
> The same techniques is used so far for MPI_Op and MPI_Datatype.
>
> I hope this explains, why I would like to have PMPI definitions for
> MPI_Request, MPI_Datatype, etc.
>
> I can still change the design of my implementation, so I am also open for
> pointers how to avoid the redefinition of MPI_Request.
>
> Cheers
>
> Max
>
> On Wed, 2017-10-04 at 16:22 +0000, Marc.PERACHE at CEA.FR wrote:
>
> Hi Max,
>
> As Marc-André said wi4mpi was designed to avoid the recompilation phase of large applications required each time you need to change the underlying MPI implementation. Basically, wi4mpi allows to change the internal representation of all MPI type declared in the mpi.h without recompiling the application in "preload" mode. Wi4mpi provides also its own MPI interface and translate types to the underlying MPI implementation in "interface" mode. In "interface" mode, you'll have to recompile your application. Currently, wi4mpi supports bi-directional ABI conversion for  OpenMPI, MPICH, IntelMPI, MPI Spectrum, wi4mpi ABI. By the end of the year we will add the MPC ABI.
>
> If I understand correctly what you want to do, wi4mpi can provide the glue between your API (i.e. enriched MPI types) used by the application and the underlying MPI implementation. In this case, you'll have to recompile your application but it doesn't require code modification in the application. If you want to avoid application recompilation you'll need to modify wi4mpi internals. If you have questions on wi4mpi, we should take this off the list and keep everyone else in CC.
>
> Regards,
> Marc
>
> -----Message d'origine-----
> De : Marc-Andre Hermanns [mailto:hermanns at jara.rwth-aachen.de <hermanns at jara.rwth-aachen.de>]
> Envoyé : mercredi 4 octobre 2017 15:51
> À : mpiwg-tools at lists.mpi-forum.org; Max Sagebaum
> Cc : PERACHE Marc 600952
> Objet : Re: [mpiwg-tools] PMPI for a complete MPI wrapper
>
> Hi Max,
>
>
>
> thanks for the fast answer. With the pmpi.h I mean a file like mpi.h
> but only containing the PMPI_ Interface. As you suggested I might try
> to create a full wrapper myself. I took a look on the wi4mpi project
> and there approach seems to create there own interface aka. mpi.h and
> then wrap this to the intel MPI or OpenMPI implementation. Due to this
> approach, they know the data types and can generate the interface.
>
>
>
> For the wi4mpi project, Jean-Baptiste and people from CEA may be the
> right people to talk to.
>
> The wi4mpi is a bit of a special project, as it is provides a software
> 'glue' to make MPI implementations interchangeable. It is a way to
> overcome the missing ABI (i.e., a specification of types, etc.).
>
> Usually, the users will have to recompile their application every time
> they choose a different MPI (potentially also when using a different
> version of the same MPI), as values and types in the mpi.h may have
> changed. For large simulation codes, this can take a long time. When
> you have a translating 'glue' like wi4mpi in between, you can swap MPI
> implementations via LD_PRELOAD at the start time of the application.
>
> I don't know enough about wi4mpi to really know what their goal is:
> Have a mixed MPI run (e.g., couple two codes compiled against differnt
> MPIs)? Use a library compiled for one MPI together with an application
> compiled against another? Just make it easier for users to link
> against the right MPI? All of the above?
>
> @Marc? Any comments on what the design goal of wi4mpi is? Does it
> support other MPI implementations beside Intel-MPI and Open-MPI?
>
> (If this discussion drifts more towards 'wi4mpi' specifics, we should
> take this off the list and keep everyone else in CC)
>
>
>
> In my library I wanted to use a light wight wrapper. That is I wanted
> to use the original data types. With this approach I currently have
> structures like:
>
> struct AMPI_Comm {
> // my own data;
> MPI_Comm comm; // the original object
> };
>
> I can then simply call the pmpi functions with the stored original
> object.
> If I have a wrapper such that there is a PMPI_Comm object available, I
> could do the following:
> struct MPI_Comm {
> // my own data;
> PMPI_Comm comm; // the original object
> };
>
> If the wrapper should use the same types from a general mpi.h, then I
> do not know the types and would need to declare something like:
>
> hidden_mpi.c
> #include <mpi.h>
> decltype(MPI_COMM_WORLD) PMPI_COMM_WORLD = MPI_COMM_WORLD;
>
> and then I need to use PMPI_COMM_WORLD in my library and I can
> generate a hmpi.h wich contains lines like:
> #define MPI_COMM_WORLD PMPI_COMM_WORLD
>
> Which could be included by the user. But in order use PMPI_* in my
> library, I need to specify the symbol in a header file for which I
> need the type. In order to get the type I need to include mpi.h which
> will define MPI_COMM_WORLD and I have a name clash.
>
>
>
> mpi.h only _declares_ the prototype. It does not define anything
> (apart from CPP macros, etc.).
>
> If you provide your own types, then you will need to declare your own
> prototypes, which I would generate (see below).
>
>
>
> So unfortunately I see no way in providing a wrapper without writing a
> complete MPI Interface, which I would like to avoid. I might be able
> to use the wi4mpi Project and use there interface as a base for my
> implementation, which would add a dependency to my project.
>
>
>
> For just a 'classic' wrapper, you just need to provide the definition
> (implementation) of the function you want to replace, adhering to the
> declared (in mpi.h) function prototype.
>
>
>
> A third and very ugly option would be, that I define all my types as
> void* in the interface for the user. But this disables type checking
> and I still would need to wrap from void* to references of my types.
>
> So I might just stay in my AMPI namespace and provide a macro for the
> user to either call regular mpi functions or my wrapper functions.
>
>
>
> If you need 'classic' wrappers for your project, you might consider
> generating that code with a generator like 'wrap' [2].
>
> As I mentioned in my other mail, writing a 'classic' wrapper is
> straight forward.
>
> Cheers,
> Marc-Andre
>
>
> [2] https://github.com/LLNL/wrap
>
>
> On Wed, 2017-10-04 at 11:00 +0200, Jean-Baptiste BESNARD wrote:
>
>
> Dear Max,
>
> I’m not sure I completely understand what you mean by a « pmpi.h »
> however I may have some initial elements below.
>
> The PMPI interface is currently targeting MPI functions only and
> indeed some of the values you’ll find in your executable will be
> compile time constants.
> In fact, most MPI types/Constants are implementation dependent,
> there is no unified ABI.
>
> Nonetheless, you might be able to interpret them in your wrapper
> library in order to have them « rerouted » to your target
> implementation.
> I mean, knowing the value of MPI_COMM_WORLD you could rewrite it to
> be MPI_COMM_WORLD2.
> And for sure you wont’t find a PMPI_COMM_WORLD.
>
> I can help on writing a wrapper for the whole PMPI interface. See my
> repo here: https://github.com/besnardjb/mpi-snippets
> There is a simple python script generating VIM snippets for MPI from
> JSON specs, it can easily be converted to a script generating the
> whole MPI interface.
>
> Eventually, an approach close to what you want to do might
> be https://github.com/cea-hpc/wi4mpi which operates this systematic
> handler conversion between MPI flavors, but this clearly involves
> some rewriting.
>
> Hope this helps.
>
> Regards,
>
> Jean-Baptiste.
>
>
>
> Le 4 oct. 2017 à 10:35, Max Sagebaum
> <max.sagebaum at scicomp.uni-kl.de
> <mailto:max.sagebaum at scicomp.uni-kl.de <max.sagebaum at scicomp.uni-kl.de>>> a écrit :
> Hello @ all,
>
> my question is concerning the PMPI specification. I hope the list
> is the correct place to ask.
>
> I want to write a complete wrapper for MPI. That is every define,
> typedef and function will be wrapped and might be completely
> changed. Currently I prefixed everything with AMPI_ such that no
> name clashes exist. But the user would need to rename every
> occurrence of MPI_ with AMPI_
>
> I would now like to use the PMPI definition of MPI to define my
> wrappers as the MPI version which then use the PMPI definitions.
> Unfortunately I could not find tutorials for a complete wrapper.
>
> As an example take MPI_COMM_WORLD. I made a grep on the openmpi
> installation on my linux machine for PMPI_COMM_WORLD but the result
> was empty. The definition of MPI_COMM_WORLD was
> #define MPI_COMM_WORLD OMPI_PREDEFINED_GLOBAL( MPI_Comm,
> ompi_mpi_comm_world)
> without any chance to switch to PMPI_COMM_WORLD as a predefined macro.
>
> I also checked the newest source tarball of openmpi and I could not
> find anything for PMPI_COMM_WORLD there.
>
> In the mpi 3.0 standard on page 555 in section 14.2.1 the
> requirements are just listed for functions. Was the definition of
> the PMPI_ supplements for defines, types etc. never discussed?
>
> I would have expected, that I can just include a pmpi.h and then I
> would have all the PMPI_ symbols without the MPI symbols available.
>
> Do you know of any way I could make my idea work?
>
> Cheers
>
> Max
>
> --
> Max Sagebaum
>
> Chair for Scientific Computing,
> TU Kaiserslautern,
> Bldg/Geb 34, Paul-Ehrlich-Strasse,
> 67663 Kaiserslautern, Germany
>
> Phone: +49 (0)631 205 5638
> Fax:   +49 (0)631 205 3056
> Email: max.sagebaum at scicomp.uni-kl.de <mailto:max.sagebaum at scicomp.uni-kl.de <max.sagebaum at scicomp.uni-kl.de>>
> URL:   www.scicomp.uni-kl.de <http://www.scicomp.uni-kl.de>
>
>
>
>
>
> _______________________________________________
> mpiwg-tools mailing listmpiwg-tools at lists.mpi-forum.org
> <mailto:mpiwg-tools at lists.mpi-forum.org <mpiwg-tools at lists.mpi-forum.org>>https://lists.mpi-forum.org/mailman/listinfo/mpiwg-tools
>
>
>
> _______________________________________________
> mpiwg-tools mailing listmpiwg-tools at lists.mpi-forum.org <mailto:mpiwg-tools at lists.mpi-forum.org <mpiwg-tools at lists.mpi-forum.org>>https://lists.mpi-forum.org/mailman/listinfo/mpiwg-tools
>
>
> _______________________________________________
> mpiwg-tools mailing listmpiwg-tools at lists.mpi-forum.orghttps://lists.mpi-forum.org/mailman/listinfo/mpiwg-tools
>
> --
>
> Max Sagebaum Chair for Scientific Computing, TU Kaiserslautern, Bldg/Geb
> 34, Paul-Ehrlich-Strasse, 67663 Kaiserslautern, Germany Phone: +49 (0)631
> 205 5638 <+49%20631%202055638> Fax: +49 (0)631 205 3056
> <+49%20631%202053056> Email: max.sagebaum at scicomp.uni-kl.de URL:
> www.scicomp.uni-kl.de
>
> _______________________________________________
> mpiwg-tools mailing list
> mpiwg-tools at lists.mpi-forum.org
> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-tools
>
>
>
>
> _______________________________________________
> mpiwg-tools mailing list
> mpiwg-tools at lists.mpi-forum.org
> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-tools
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-tools/attachments/20171012/cd6ef20a/attachment-0001.html>