[Mpi3-rma] RMA communication with single network messages

Pavan Balaji balaji at mcs.anl.gov
Thu Jul 14 10:40:42 CDT 2011

MPI_Rsend still has matching complexity which RMA operations don't.

I think what Oliver wants can be achieved by passing the 
MPI_MODE_NOCHECK assertion to the post/start calls.

  -- Pavan

On 07/14/2011 08:55 AM, James Dinan wrote:
> Hi Oliver,
> If you are certain that the receive has already been posted, you could
> use MPI_Rsend, a "ready" send.  This should be able to give you both (a)
> and (b).
>   ~Jim.
> On 07/11/2011 10:59 AM, Oliver Mangold wrote:
>> What I would like to do with MPI is point-to-point communication with
>> minimal overhead/latency, which means the communication
>> (a) does not need storage of data in intermediate buffers at the
>> destination
>> (b) does not need more than 1 data transfer (network message) per
>> communication
>> Using MPI point-to-point has the problem that it cannot abide both (a)
>> and (b) at the same time, as the sent data might receive the destination
>> before MPI_Recv is called.
>> So I am wondering if this will be possible with the improved RMA
>> features of MPI-3. The draft paper (as the 2.2 standard also does) says:
>>> RMA communications fall in two categories:
>>> * active target communication, where data is moved from the memory of
>>> one process
>>> to the memory of another, and both are explicitly involved in the
>>> communication. This
>>> communication pattern is similar to message passing, except that all
>>> the data transfer
>>> arguments are provided by one process, and the second process only
>>> participates in
>>> the synchronization.
>> Actually this would be exactly what I wanted. The sender should provide
>> all the transfer arguments (including destination memory location) and
>> the receiver only waits for the message to arrive.
>> As I understand MPI-2.2 RMA, the sender has to do:
>> MPI_Win_start(group, flag, win);
>> MPI_Put(...,win);
>> MPI_Win_complete(win);
>> While the receiver does:
>> MPI_Win_post(group, flag, win);
>> MPI_Win_wait(win);
>> As I understand things, the semantics of MPI RMA require that
>> destination memory is not written before the call to MPI_Win_post().
>> This means either must the receiver signal the sender that he has
>> reached MPI_Win_post() or the data must be buffered at the receiver,
>> resulting in the same problem as with MPI point-to-point. Please correct
>> me if I'm wrong.
>> The problem could be solved, if there were windows that are always
>> 'open', this means no MPI_Win_start() and MPI_Win_post() necessary (only
>> MPI_Win_complete() and MPI_Win_wait() to inform the receiver that the
>> data has arrived). If the framework merges the data transfers needed for
>> MPI_Put and MPI_Win_complete, a single message would be sufficient. But
>> I couldn't find a feature in the draft standard that helps here. So the
>> question is, is there a way to do what I want with MPI-3?
>> Maybe I should note that windows which are always open are useful, as
>> with pair-wise-communication (with data always going both ways) double
>> buffering would fix race conditions.
>> Example (assuming MPI_Win_start() and MPI_Win_post() are not needed):
>> Process 0:
>> while () {
>>    MPI_Put(...,win1a);
>>    MPI_Win_complete(win1a);
>>    MPI_Win_wait(win0a);
>>    ... do computation on data from win0a and write data for win1b ...
>>    MPI_Put(...,win1b);
>>    MPI_Win_complete(win1b);
>>    MPI_Win_wait(win0b);
>>    ... do computation on data from win0b and write data for win1a ...
>> }
>> Process 1:
>> while () {
>>    MPI_Put(...,win0a);
>>    MPI_Win_complete(win0a);
>>    MPI_Win_wait(win1a);
>>    ... do computation on data from win1a and write data for win0b ...
>>    MPI_Put(...,win0b);
>>    MPI_Win_complete(win0b);
>>    MPI_Win_wait(win1b);
>>    ... do computation on data from win1b and write data for 0a ...
>> }
>> _______________________________________________
>> mpi3-rma mailing list
>> mpi3-rma at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma
> _______________________________________________
> mpi3-rma mailing list
> mpi3-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma

Pavan Balaji

More information about the mpiwg-rma mailing list