[MPI3 Fortran] MPI non-blocking transfers
Jeff Squyres
jsquyres at cisco.com
Wed Jan 21 07:51:22 CST 2009
On Jan 21, 2009, at 6:04 AM, N.M. Maclaren wrote:
> 1) Most people seem to agree that the semantics of the buffers used
> for MPI non-blocking transfers and pending input/output storage
> affectors are essentially identical, with READ, WRITE and WAIT
> corresponding to MPI_Isend, MPI_IRecv and MPI_Wait (and variations).
>
> Do you agree with this and, if not, why not?
I'm an MPI implementor; I don't know enough about Fortran to answer
your questions definitively, but I can state what the MPI non-blocking
send/receive buffer semantics are.
There are several different flavors of non-blocking sends/receives in
MPI; I'll use MPI_ISEND and MPI_IRECV as token examples ("I" =
"immediate", meaning that the function returns "immediately",
potentially before the message has actually been sent or received).
1. When an application invokes MPI_ISEND / MPI_IRECV, it essentially
hands off the buffer to the MPI implementation and promises not to
write to the buffer until later. The MPI implementation then "owns"
the buffer.
2. A rule is just about to be passed in MPI-2.2 such that *sends*
(e.g., MPI_ISEND) can still *read* the buffer while the send is
ongoing (writing to the buffer while the send is ongoing is nonsense,
of course).
3. The buffer is specified by a triple of arguments (I'll explain in
terms of C because of my inexperience with Fortran):
- void *buffer: a pointer representing the first base of the buffer
(NOTE: it may not actually point to the first byte of the message!)
- int count: number of datatypes in the message (see the next
argument)
- MPI_Datatype type: the datatype of the message, implying both the
size and the interpretation of the bytes
MPI has a number of intrinsic datatypes (such as MPI_INTEGER,
representing a single fortran INTEGER). The intrinsic MPI datatypes
can be combined in several ways to represent complex data structures.
Hence, it is possible to build up a user-defined MPI_Datatype that
represents a C struct -- even if the struct has memory "holes" in it.
As such, MPI_Datatypes can be considered a memory map of (relative
offset, type) tuples, where the "relative offset" part is relative to
the (buffer) argument in MPI_ISEND/MPI_IRECV/etc. MPI_INTEGER could
therefore be considered a single (0, N-byte integer) tuple (where N is
whatever is correct for your platform).
A special buffer, denoted by MPI_BOTTOM, is an arbitrarily-fixed place
in memory (usually 0, but it doesn't have to be). Since MPI_Datatypes
are composed of relative offsets, applications can build datatypes
relative to MPI_BOTTOM for [effectively] direct placement into memory.
Some Fortran examples
INTEGER i
CALL MPI_ISEND(i, 1, MPI_INTEGER, ...)
Sends a single INTEGER starting at the buffer pointed to by i
INTEGER iarray(10)
CALL MPI_ISEND(iarray, 10, MPI_INTEGER, ...)
Sends 10 INTEGERs starting at the buffer pointed to by iarray
INTEGER iarray(9999)
CALL MPI_ISEND(iarray, 10, MPI_INTEGER, ...)
Same as above -- sends the first 10 INTEGERs starting at the buffer
pointed to by iarray
INTEGER iarray(9999)
CALL MPI_ISEND(iarray(37), 10, MPI_INTEGER, ...)
Sends iarray(37) through iarray(46)
INTEGER iarray(9999)
C ..build up a datatype relative to MPI_BOTTOM that points to
iarray..
CALL MPI_ISEND(MPI_BOTTOM, 10, my_datatype, ...)
Sends the first 10 elements of iarray
Some C examples:
int i;
MPI_Isend(&i, 1, MPI_INT, ...);
Sends 1 int starting at the buffer pointed to by &i
int i[9999];
MPI_Isend(&i[37], 10, MPI_INT, ...);
Sends i[37] through i[46]
int i[9999];
/* ..build up MPI_Datatype relative to MPI_BOTTOM that points to
&i[0].. */
MPI_Isend(MPI_BOTTOM, 1, my_datatype, ...);
Sends i[0]
struct foo { int a; double b; char c; } foo_instance;
/* ..build up MPI_Datatype to represent struct foo.. */
MPI_Isend(&foo_instance, 1, foo_datatype, ...);
Sends the foo struct (likely only transmitting the data, not the
"holes")
4. A returned value from MPI_ISEND and MPI_RECV is a handle that can
be passed to MPI later to check and see if the communication
associated with that handle has completed. There are essentially two
flavors of the check-for-completion semantic: polling blocking.
- MPI_TEST accepts a single request handle and polls to see if the
associated communication has completed, and essentially returns
"true" (the communication has completed; the application now owns the
buffer) or "false" (the communication has not yet completed; MPI still
owns the buffer).
- MPI_WAIT accepts a single request handle and blocks until the
associated communication has completed. When MPI_WAIT returns, the
application owns the buffer associated with the communication.
- There are array versions of MPI_TEST and MPI_WAIT as well; you
can pass an array of requests to the array flavors of MPI_TEST (where
some may complete and some may not) or MPI_WAIT (where all requests
will complete before returning).
5. All Fortran MPI handles are [currently] expressed as INTEGERs. The
MPI implementation takes these integers and converts them to a back-
end C pointer. We are contemplating changing this for the upcoming
F03 MPI bindings to avoid this translation where Fortran handles will
likely be the same representation as C MPI handles (i.e., pointers --
or, thought of differently, "very large address-sided integers").
Hope that made sense!
--
Jeff Squyres
Cisco Systems
More information about the mpiwg-fortran
mailing list