[Mpi3-rma] RMA communication with single network messages
Oliver Mangold
mangold at hlrs.de
Mon Jul 11 10:59:57 CDT 2011
What I would like to do with MPI is point-to-point communication with
minimal overhead/latency, which means the communication
(a) does not need storage of data in intermediate buffers at the destination
(b) does not need more than 1 data transfer (network message) per
communication
Using MPI point-to-point has the problem that it cannot abide both (a)
and (b) at the same time, as the sent data might receive the destination
before MPI_Recv is called.
So I am wondering if this will be possible with the improved RMA
features of MPI-3. The draft paper (as the 2.2 standard also does) says:
> RMA communications fall in two categories:
> * active target communication, where data is moved from the memory of
> one process
> to the memory of another, and both are explicitly involved in the
> communication. This
> communication pattern is similar to message passing, except that all
> the data transfer
> arguments are provided by one process, and the second process only
> participates in
> the synchronization.
Actually this would be exactly what I wanted. The sender should provide
all the transfer arguments (including destination memory location) and
the receiver only waits for the message to arrive.
As I understand MPI-2.2 RMA, the sender has to do:
MPI_Win_start(group, flag, win);
MPI_Put(...,win);
MPI_Win_complete(win);
While the receiver does:
MPI_Win_post(group, flag, win);
MPI_Win_wait(win);
As I understand things, the semantics of MPI RMA require that
destination memory is not written before the call to MPI_Win_post().
This means either must the receiver signal the sender that he has
reached MPI_Win_post() or the data must be buffered at the receiver,
resulting in the same problem as with MPI point-to-point. Please correct
me if I'm wrong.
The problem could be solved, if there were windows that are always
'open', this means no MPI_Win_start() and MPI_Win_post() necessary (only
MPI_Win_complete() and MPI_Win_wait() to inform the receiver that the
data has arrived). If the framework merges the data transfers needed for
MPI_Put and MPI_Win_complete, a single message would be sufficient. But
I couldn't find a feature in the draft standard that helps here. So the
question is, is there a way to do what I want with MPI-3?
Maybe I should note that windows which are always open are useful, as
with pair-wise-communication (with data always going both ways) double
buffering would fix race conditions.
Example (assuming MPI_Win_start() and MPI_Win_post() are not needed):
Process 0:
while () {
MPI_Put(...,win1a);
MPI_Win_complete(win1a);
MPI_Win_wait(win0a);
... do computation on data from win0a and write data for win1b ...
MPI_Put(...,win1b);
MPI_Win_complete(win1b);
MPI_Win_wait(win0b);
... do computation on data from win0b and write data for win1a ...
}
Process 1:
while () {
MPI_Put(...,win0a);
MPI_Win_complete(win0a);
MPI_Win_wait(win1a);
... do computation on data from win1a and write data for win0b ...
MPI_Put(...,win0b);
MPI_Win_complete(win0b);
MPI_Win_wait(win1b);
... do computation on data from win1b and write data for 0a ...
}
More information about the mpiwg-rma
mailing list