[Mpi3-rma] MPI-3.1 RMA planning

Thu Jul 26 05:05:39 CDT 2012

Pavan,
> Now that the MPI-3.0 changes are almost done, I wanted to start
> discussing what pieces we want to add into MPI-3.1.  Here are a few
> things I'd like to have considered:
>
> 1. We currently have an assert for MPI_MODE_NOSTORE.  But there's no
> assert for MPI_MODE_NOLOAD.
> 
> Use case: This is an optimization for the separate window case.
> Logically, when an epoch starts, the public copy of the window is
> copied into the private copy to allow for load/store accesses.  When
> the epoch ends, the private copy is copied back into the public copy.
> The MPI_MODE_NOSTORE assert allows the MPI implementation to get rid
> of the copy at the close of the epoch.  If both NOLOAD and NOSTORE
> asserts are given, the copy at the start of the epoch can be avoided
> as well.
Makes sense.

> 2. We currently allow the MPI implementation to create a window in
> either the unified or separate mode.  However, there is no way for the
> user to say that SEPARATE is sufficient (like with threading modes).
> For example, even on cache coherent systems, if the user calls
> MPI_WIN_CREATE() the MPI implementation might choose to provide the
> SEPARATE memory model and create a different public copy of the window
> in symmetrically allocated memory or on a shared memory region.  But
> this will disable some capabilities for the user (e.g., simultaneous
> load/store and PUT/GET accesses to different parts of the same window).
>
> My recommendation: define that UNIFIED > SEPARATE (which we already
> kind of do, but don't explicitly state it), and allow the user to give
> a "required" value as an info argument.  The "provided" value of the
> memory model can already be queried through the attribute.  As with
> threading modes, the MPI implementation is allowed to return a
> "provided" value which is less than, greater than, or the same as the
> "required" value.
>
> While the current use case might only be for WIN_CREATE, I'd also
> recommend that we add it to all window creation routines for symmetry.
> Also, WIN_ALLOCATE or WIN_ALLOCATE_SHARED might not run into such
> restrictions, but the user might still prefer WIN_CREATE if (s)he
> wants to expose already allocated memory that the application is
> already working on.
Why would an MPI implementation ever want to return SEPARATE if UNIFIED
would be supported? And why would a user want this?

> 3. In our semantics and correctness section, for the SEPARATE memory
> model, we state that a PUT/ACCUMULATE must not access a target window
> once a local update or PUT/GET to an overlapping window has started.
> I'd recommend that we change this to "... to the same or an
> overlapping window has started", since it obviously applies to the
> same window as well, and not just overlapping windows.
That's close to a ticket 0 :).

> 4. Currently, in the SEPARATE memory model, we do not allow
> PUT/ACCUMULATE to a target window once a local update or PUT/GET to
> that window has started.  This makes sense for local load/store, but
> might not for local PUT/GET since those could directly target the
> public copy.  However, when a process opens an epoch on its local
> window, the MPI implementation has to assume that it will perform
> load/store operations as well.  The user can provide MPI_MODE_NOLOAD
> and MPI_MODE_NOSTORE assertions, but these are just hints and the MPI
> implementation is allowed to ignore them.  Thus, from the user's
> perspective, even if (s)he gives the NOLOAD/NOSTORE guarantees, such
> accesses are still not allowed.
>
> Possible (very lame) recommendation: Make NOLOAD/NOSTORE assertions
> mandatory for the MPI implementation to honor.  Thus, when the
> application gives these assertions, it is no longer restricted from
> simultaneously performing PUT/GET operations locally and remotely to
> disjoint locations of the window.
>
> I realize that the above recommendation is bad in many ways including:
> (1) we are changing the semantics of "asserts" here, and (2) this
> might force local PUT/GET operations to always go through the network
> as if they are issued by a remote process because of cache coherence
> issues and thus might lose performance.  But this is just to get some
> discussion started.
I'm not sure about this ... but I think this needs some face-to-face time.

I would like to add some info arguments to the list of things to
consider. I don't have the full list at this point, but we have the
"same_size" argument for create and allocate. However, we have no
"same_displ_unit", which goes by the same rationale. We could also add
some info arguments to dynamic windows to mitigate some of the
implementation issues (allow optimized implementations on RMA systems).

All the Best,
  Torsten

-- 
### qreharg rug ebs fv crryF ------------- http://www.unixer.de/ -----
Torsten Hoefler         | Performance Modeling and Simulation Lead
Blue Waters Directorate | University of Illinois (UIUC)
1205 W Clark Street     | Urbana, IL, 61801
NCSA Building           | +01 (217) 244-7736