[Mpi3-rma] RMA communication with single network messages

Rajeev Thakur thakur at mcs.anl.gov
Wed Jul 13 16:00:06 CDT 2011

There isn't a way to do exactly what you want, but you could do either of the following.

1. Call MPI_Win_post immediately after MPI_Win_create and immediately after every call to MPI_Win_complete.

2. Use active-target synchronization (lock-unlock), in which the target window is always open. However, the target doesn't know when the data transfer has completed, and you have to use some other means to know that.


On Jul 11, 2011, at 10:59 AM, Oliver Mangold wrote:

> What I would like to do with MPI is point-to-point communication with minimal overhead/latency, which means the communication
> (a) does not need storage of data in intermediate buffers at the destination
> (b) does not need more than 1 data transfer (network message) per communication
> Using MPI point-to-point has the problem that it cannot abide both (a) and (b) at the same time, as the sent data might receive the destination before MPI_Recv is called.
> So I am wondering if this will be possible with the improved RMA features of MPI-3. The draft paper (as the 2.2 standard also does) says:
>> RMA communications fall in two categories:
>> * active target communication, where data is moved from the memory of one process
>> to the memory of another, and both are explicitly involved in the communication. This
>> communication pattern is similar to message passing, except that all the data transfer
>> arguments are provided by one process, and the second process only participates in
>> the synchronization.
> Actually this would be exactly what I wanted. The sender should provide all the transfer arguments (including destination memory location) and the receiver only waits for the message to arrive.
> As I understand MPI-2.2 RMA, the sender has to do:
> MPI_Win_start(group, flag, win);
> MPI_Put(...,win);
> MPI_Win_complete(win);
> While the receiver does:
> MPI_Win_post(group, flag, win);
> MPI_Win_wait(win);
> As I understand things, the semantics of MPI RMA require that destination memory is not written before the call to MPI_Win_post(). This means either must the receiver signal the sender that he has reached MPI_Win_post() or the data must be buffered at the receiver, resulting in the same problem as with MPI point-to-point. Please correct me if I'm wrong.
> The problem could be solved, if there were windows that are always 'open', this means no MPI_Win_start() and MPI_Win_post() necessary (only MPI_Win_complete() and MPI_Win_wait() to inform the receiver that the data has arrived). If the framework merges the data transfers needed for MPI_Put and MPI_Win_complete, a single message would be sufficient. But I couldn't find a feature in the draft standard that helps here. So the question is, is there a way to do what I want with MPI-3?
> Maybe I should note that windows which are always open are useful, as with pair-wise-communication (with data always going both ways) double buffering would fix race conditions.
> Example (assuming MPI_Win_start() and MPI_Win_post() are not needed):
> Process 0:
> while () {
>  MPI_Put(...,win1a);
>  MPI_Win_complete(win1a);
>  MPI_Win_wait(win0a);
>  ... do computation on data from win0a and write data for win1b ...
>  MPI_Put(...,win1b);
>  MPI_Win_complete(win1b);
>  MPI_Win_wait(win0b);
>  ... do computation on data from win0b and write data for win1a ...
> }
> Process 1:
> while () {
>  MPI_Put(...,win0a);
>  MPI_Win_complete(win0a);
>  MPI_Win_wait(win1a);
>  ... do computation on data from win1a and write data for win0b ...
>  MPI_Put(...,win0b);
>  MPI_Win_complete(win0b);
>  MPI_Win_wait(win1b);
>  ... do computation on data from win1b and write data for 0a ...
> }
> _______________________________________________
> mpi3-rma mailing list
> mpi3-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma

More information about the mpiwg-rma mailing list