[Mpi3-rma] RMA communication with single network messages

James Dinan dinan at mcs.anl.gov
Thu Jul 14 08:55:38 CDT 2011

Hi Oliver,

If you are certain that the receive has already been posted, you could
use MPI_Rsend, a "ready" send.  This should be able to give you both (a)
and (b).


On 07/11/2011 10:59 AM, Oliver Mangold wrote:
> What I would like to do with MPI is point-to-point communication with
> minimal overhead/latency, which means the communication
> (a) does not need storage of data in intermediate buffers at the
> destination
> (b) does not need more than 1 data transfer (network message) per
> communication
> Using MPI point-to-point has the problem that it cannot abide both (a)
> and (b) at the same time, as the sent data might receive the destination
> before MPI_Recv is called.
> So I am wondering if this will be possible with the improved RMA
> features of MPI-3. The draft paper (as the 2.2 standard also does) says:
>> RMA communications fall in two categories:
>> * active target communication, where data is moved from the memory of
>> one process
>> to the memory of another, and both are explicitly involved in the
>> communication. This
>> communication pattern is similar to message passing, except that all
>> the data transfer
>> arguments are provided by one process, and the second process only
>> participates in
>> the synchronization.
> Actually this would be exactly what I wanted. The sender should provide
> all the transfer arguments (including destination memory location) and
> the receiver only waits for the message to arrive.
> As I understand MPI-2.2 RMA, the sender has to do:
> MPI_Win_start(group, flag, win);
> MPI_Put(...,win);
> MPI_Win_complete(win);
> While the receiver does:
> MPI_Win_post(group, flag, win);
> MPI_Win_wait(win);
> As I understand things, the semantics of MPI RMA require that
> destination memory is not written before the call to MPI_Win_post().
> This means either must the receiver signal the sender that he has
> reached MPI_Win_post() or the data must be buffered at the receiver,
> resulting in the same problem as with MPI point-to-point. Please correct
> me if I'm wrong.
> The problem could be solved, if there were windows that are always
> 'open', this means no MPI_Win_start() and MPI_Win_post() necessary (only
> MPI_Win_complete() and MPI_Win_wait() to inform the receiver that the
> data has arrived). If the framework merges the data transfers needed for
> MPI_Put and MPI_Win_complete, a single message would be sufficient. But
> I couldn't find a feature in the draft standard that helps here. So the
> question is, is there a way to do what I want with MPI-3?
> Maybe I should note that windows which are always open are useful, as
> with pair-wise-communication (with data always going both ways) double
> buffering would fix race conditions.
> Example (assuming MPI_Win_start() and MPI_Win_post() are not needed):
> Process 0:
> while () {
>   MPI_Put(...,win1a);
>   MPI_Win_complete(win1a);
>   MPI_Win_wait(win0a);
>   ... do computation on data from win0a and write data for win1b ...
>   MPI_Put(...,win1b);
>   MPI_Win_complete(win1b);
>   MPI_Win_wait(win0b);
>   ... do computation on data from win0b and write data for win1a ...
> }
> Process 1:
> while () {
>   MPI_Put(...,win0a);
>   MPI_Win_complete(win0a);
>   MPI_Win_wait(win1a);
>   ... do computation on data from win1a and write data for win0b ...
>   MPI_Put(...,win0b);
>   MPI_Win_complete(win0b);
>   MPI_Win_wait(win1b);
>   ... do computation on data from win1b and write data for 0a ...
> }
> _______________________________________________
> mpi3-rma mailing list
> mpi3-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma

More information about the mpiwg-rma mailing list