[Mpi3-rma] RMA synchronization optimization [was: Updated MPI-3 RMA proposal 1]

Mon Jun 21 07:38:21 CDT 2010

At the risk of prolonging an already difficult-to-follow set of e-mail
threads, I have to re-iterate my concerns for implementation efficiency.

The impediment I see to creating efficient implementations of MPI RMA is
that the synchronization primitives provide too much freedom. By allowing
one to switch back and forth between different synchronization methods on
the same window, an implementation must keep track of more state
information and handle complex corner-cases, all of which precludes an
optimized implementation. It's been my (admittedly limited) experience that
the synchronization adds a significant overhead, and that overhead is
largely due to the handling of state and special cases involving mixed
synchronization methods. I have two high-level suggestions, listed in the
order I prefer them:

1. Leave MPI2 One-Sided as-is (and hope to deprecate it someday), create a
new and separate RMA scheme which is intended to replace the old, which
uses a single synchronization method (say, the *lock* methods being
proposed). I prefer this path because it gives us the flexibility to design
exactly what we want without being tied to the previous, possibly flawed,
design. There is, admittedly, extra work involved with have two sets of
APIs, but I think there is some room for re-use and common code, and I feel
the extra work is worth the benefit.

2. Augment the MPI2 One-Sided specification with the ability for the user
to specify a single synchronization method to be used exclusively on a
given window. This could be by adding Win_create/allocate functions that
take an "assert" which specifies the synchronization method to be used,
and/or a way to specify "eras" of epochs that will use a single
synchronization method - for example, a program can declare at some point
that a given window will use only lock/unlock until the next declaration
call (specifying another synchronization method, or "all"). At least with
such capabilities, an implementation could allow programs to be more
efficient if they choose to take the optimization of using a single
synchronization method. I know that Win_create does not currently have
asserts, but there is a way to add a new function for creating windows that
does have asserts (and ensure Win_allocate also has asserts) and then
define that the current Win_create is equivalent to Win_create_assert
(e.g.) with "asserts" set to zero. Depending on the asserts defined, of
course, that should allow the existing Win_create to maintain backward
compatibility with MPI2.

thanks,
_______________________________________________
Douglas Miller                  BlueGene Messaging Development
IBM Corp., Rochester, MN USA                     Bldg 030-2 A410
dougmill at us.ibm.com               Douglas Miller/Rochester/IBM

             William Gropp                                                 
             <wgropp at illinois.                                             
             edu>                                                       To 
             Sent by:                  "MPI 3.0 Remote Memory Access       
             mpi3-rma-bounces@         working group"                      
             lists.mpi-forum.o         <mpi3-rma at lists.mpi-forum.org>      
             rg                                                         cc 

                                                                   Subject 
             06/21/2010 12:05          Re: [Mpi3-rma] Updated MPI-3 RMA    
             AM                        proposal 1                          

             Please respond to                                             
              "MPI 3.0 Remote                                              
               Memory Access                                               
              working group"                                               
             <mpi3-rma at lists.m                                             
               pi-forum.org>                                               

I agree with Rajeev.  And I think we strayed somewhat from the
original plan.

The goal for the MPI RMA was to make enlarge the set of applications
that could be efficiently implemented with a one-sided model.  The
current model *is* a good one for *some* applications; the complaints
about it are often because it doesn't fit some other application.  The
RMA extensions for MPI-3, in my mind, needed to address *some*, not
all, of the important application areas that the current model does
not handle.  It is interesting to consider whether a reasonable
functional implementation of, say, UPC, could be implemented with it,
but I do not see the MPI RMA as supplying the universal implementation
layer for other programming models.

Making MPI RMA suitable for implementing all other parallel
programming models will require more than I think we want to do - look
at the short word and aligned move routines in GASNET as an example.
You don't need these functionally, but you may need them for
performance.

That's why we asked for *application* use cases - those would drive
the design.  We have a few but we haven't focused on them as much as I
think we should.  What I wanted in proposal 1 was a set of operations,
each of which (a) was consistent with the others and (b) was clearly
driven by some *application* need (where an application in this case
is *not* implementing another programming model).  This very well may
have required some options to deal with things like selecting
different ordering and overlapping update semantics, though that could
be very coarse grained.

Note that in this interpretation, proposal 1 is not a "bare minimum";
rather, it is a consensus collection of consistent extensions that
enlarge the space of applications that can be efficiently coded using
MPI-3 one-sided.  It will leave some useful features out and some
programming models should focus on interoperability with MPI rather
than having MPI-3 RMA provide the specific features that they need.
It is fine if there is something important that can't be done with
MPI-3 RMA, as long as there are other important things that can be
done with it.

Bill

On Jun 20, 2010, at 6:03 PM, Rajeev Thakur wrote:

> Are you refering to Accumulate_get :-)? Maybe it should be in Proposal
> 2.
>
> Maybe we also need a "journal of development" as in MPI-2 :-).
>
> But, seriously, we need to present a united front at least in proposal
> 1. Otherwise the Forum will have no confidence in us.
>
> Rajeev
>
>
>
>> -----Original Message-----
>> From: mpi3-rma-bounces at lists.mpi-forum.org
>> [mailto:mpi3-rma-bounces at lists.mpi-forum.org] On Behalf Of
>> Pavan Balaji
>> Sent: Sunday, June 20, 2010 5:57 PM
>> To: MPI 3.0 Remote Memory Access working group
>> Subject: Re: [Mpi3-rma] Updated MPI-3 RMA proposal 1
>>
>>
>> On 06/20/2010 05:48 PM, Rajeev Thakur wrote:
>>> Proposal 1: This is what the RMA experts agree is the bare minimum
>>> needed to fix what is considered broken in MPI-2 RMA.
>>
>> I don't agree that whatever is there in proposal 1 is the
>> "bare minimum". Maybe this policy should be reworded as:
>> *all* members of the working group should agree that this is needed.
>>
>> This makes both proposal 1 and proposal 2 contain random
>> pieces of unrelated features, though.
>>
>>  -- Pavan
>>
>> --
>> Pavan Balaji
>> http://www.mcs.anl.gov/~balaji
>> _______________________________________________
>> mpi3-rma mailing list
>> mpi3-rma at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma
>>
>
> _______________________________________________
> mpi3-rma mailing list
> mpi3-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma

William Gropp
Deputy Director for Research
Institute for Advanced Computing Applications and Technologies
Paul and Cynthia Saylor Professor of Computer Science
University of Illinois Urbana-Champaign

_______________________________________________
mpi3-rma mailing list
mpi3-rma at lists.mpi-forum.org
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma