[Mpi3-rma] MPI-3.1 RMA planning

Pavan Balaji balaji at mcs.anl.gov
Wed Jul 25 22:28:52 CDT 2012


Now that the MPI-3.0 changes are almost done, I wanted to start
discussing what pieces we want to add into MPI-3.1.  Here are a few
things I'd like to have considered:

1. We currently have an assert for MPI_MODE_NOSTORE.  But there's no
assert for MPI_MODE_NOLOAD.

Use case: This is an optimization for the separate window case.
Logically, when an epoch starts, the public copy of the window is
copied into the private copy to allow for load/store accesses.  When
the epoch ends, the private copy is copied back into the public copy.
The MPI_MODE_NOSTORE assert allows the MPI implementation to get rid
of the copy at the close of the epoch.  If both NOLOAD and NOSTORE
asserts are given, the copy at the start of the epoch can be avoided
as well.

2. We currently allow the MPI implementation to create a window in
either the unified or separate mode.  However, there is no way for the
user to say that SEPARATE is sufficient (like with threading modes).
For example, even on cache coherent systems, if the user calls
MPI_WIN_CREATE() the MPI implementation might choose to provide the
SEPARATE memory model and create a different public copy of the window
in symmetrically allocated memory or on a shared memory region.  But
this will disable some capabilities for the user (e.g., simultaneous
load/store and PUT/GET accesses to different parts of the same window).

My recommendation: define that UNIFIED > SEPARATE (which we already
kind of do, but don't explicitly state it), and allow the user to give
a "required" value as an info argument.  The "provided" value of the
memory model can already be queried through the attribute.  As with
threading modes, the MPI implementation is allowed to return a
"provided" value which is less than, greater than, or the same as the
"required" value.

While the current use case might only be for WIN_CREATE, I'd also
recommend that we add it to all window creation routines for symmetry.
Also, WIN_ALLOCATE or WIN_ALLOCATE_SHARED might not run into such
restrictions, but the user might still prefer WIN_CREATE if (s)he
wants to expose already allocated memory that the application is
already working on.

3. In our semantics and correctness section, for the SEPARATE memory
model, we state that a PUT/ACCUMULATE must not access a target window
once a local update or PUT/GET to an overlapping window has started.
I'd recommend that we change this to "... to the same or an
overlapping window has started", since it obviously applies to the
same window as well, and not just overlapping windows.

4. Currently, in the SEPARATE memory model, we do not allow
PUT/ACCUMULATE to a target window once a local update or PUT/GET to
that window has started.  This makes sense for local load/store, but
might not for local PUT/GET since those could directly target the
public copy.  However, when a process opens an epoch on its local
window, the MPI implementation has to assume that it will perform
load/store operations as well.  The user can provide MPI_MODE_NOLOAD
and MPI_MODE_NOSTORE assertions, but these are just hints and the MPI
implementation is allowed to ignore them.  Thus, from the user's
perspective, even if (s)he gives the NOLOAD/NOSTORE guarantees, such
accesses are still not allowed.

Possible (very lame) recommendation: Make NOLOAD/NOSTORE assertions
mandatory for the MPI implementation to honor.  Thus, when the
application gives these assertions, it is no longer restricted from
simultaneously performing PUT/GET operations locally and remotely to
disjoint locations of the window.

I realize that the above recommendation is bad in many ways including:
(1) we are changing the semantics of "asserts" here, and (2) this
might force local PUT/GET operations to always go through the network
as if they are issued by a remote process because of cache coherence
issues and thus might lose performance.  But this is just to get some
discussion started.

Look forward to your comments.


  -- Pavan

Pavan Balaji

More information about the mpiwg-rma mailing list