[Mpi3-rma] RMA synchronization optimization [was: Updated MPI-3 RMA proposal 1]

Douglas Miller dougmill at us.ibm.com
Mon Jun 21 10:11:04 CDT 2010


Yes, I think what you are saying is what I was trying to say in option #2.
I'd prefer an explicit function to change "sync mode" rather than try and
overload fence. But I guess I can see the logic is using fence for this
purpose as well. Part of the problem I had with MPI2 RMA was the ambiguity
of fence, though.

Perhaps I was reading more into the proposal, but I thought things like
"lockall" and/or "alllockall" could conceptually replace fence. I would
assume that if a platform had hardware-assist for MPI_Win_fence that it
also could be used for lockall/alllockall, but maybe that is a stretch.


_______________________________________________
Douglas Miller                  BlueGene Messaging Development
IBM Corp., Rochester, MN USA                     Bldg 030-2 A410
dougmill at us.ibm.com               Douglas Miller/Rochester/IBM


                                                                           
             William Gropp                                                 
             <wgropp at illinois.                                             
             edu>                                                       To 
             Sent by:                  "MPI 3.0 Remote Memory Access       
             mpi3-rma-bounces@         working group"                      
             lists.mpi-forum.o         <mpi3-rma at lists.mpi-forum.org>      
             rg                                                         cc 
                                                                           
                                                                   Subject 
             06/21/2010 08:49          Re: [Mpi3-rma] RMA synchronization  
             AM                        optimization [was: Updated MPI-3   
                                       RMA proposal 1]                     
                                                                           
             Please respond to                                             
              "MPI 3.0 Remote                                              
               Memory Access                                               
              working group"                                               
             <mpi3-rma at lists.m                                             
               pi-forum.org>                                               
                                                                           
                                                                           




I believe that the original motivation for permitting the mixed sync
model was for applications that did something like this:

# Initialize a global data area
fence
various put or accumulate updates
fence

# passive-target access of the area
various lock/get/unlock accesses

Another option would be to require an explicit and collective change
to the sync mode - as there is only one passive target mode, and the
"scalable sync" mode in practice involves all processes, this would be
possible.  An info (already (mis)used for the no_locks property could
be used with win_create to specify that all changes in sync mode would
be signaled with a routine (either win_fence with an assert about sync
mode changing or a new win_sync_mode routine).

Would something like that address the implementation issues that you
see (remembering that some systems provided special hardware for a
fast win_fence, and a single sync model probably isn't sufficient)?

Bill


On Jun 21, 2010, at 7:38 AM, Douglas Miller wrote:

> At the risk of prolonging an already difficult-to-follow set of e-mail
> threads, I have to re-iterate my concerns for implementation
> efficiency.
>
> The impediment I see to creating efficient implementations of MPI
> RMA is
> that the synchronization primitives provide too much freedom. By
> allowing
> one to switch back and forth between different synchronization
> methods on
> the same window, an implementation must keep track of more state
> information and handle complex corner-cases, all of which precludes an
> optimized implementation. It's been my (admittedly limited)
> experience that
> the synchronization adds a significant overhead, and that overhead is
> largely due to the handling of state and special cases involving mixed
> synchronization methods. I have two high-level suggestions, listed
> in the
> order I prefer them:
>
> 1. Leave MPI2 One-Sided as-is (and hope to deprecate it someday),
> create a
> new and separate RMA scheme which is intended to replace the old,
> which
> uses a single synchronization method (say, the *lock* methods being
> proposed). I prefer this path because it gives us the flexibility to
> design
> exactly what we want without being tied to the previous, possibly
> flawed,
> design. There is, admittedly, extra work involved with have two sets
> of
> APIs, but I think there is some room for re-use and common code, and
> I feel
> the extra work is worth the benefit.
>
> 2. Augment the MPI2 One-Sided specification with the ability for the
> user
> to specify a single synchronization method to be used exclusively on a
> given window. This could be by adding Win_create/allocate functions
> that
> take an "assert" which specifies the synchronization method to be
> used,
> and/or a way to specify "eras" of epochs that will use a single
> synchronization method - for example, a program can declare at some
> point
> that a given window will use only lock/unlock until the next
> declaration
> call (specifying another synchronization method, or "all"). At least
> with
> such capabilities, an implementation could allow programs to be more
> efficient if they choose to take the optimization of using a single
> synchronization method. I know that Win_create does not currently have
> asserts, but there is a way to add a new function for creating
> windows that
> does have asserts (and ensure Win_allocate also has asserts) and then
> define that the current Win_create is equivalent to Win_create_assert
> (e.g.) with "asserts" set to zero. Depending on the asserts defined,
> of
> course, that should allow the existing Win_create to maintain backward
> compatibility with MPI2.
>
> thanks,
> _______________________________________________
> Douglas Miller                  BlueGene Messaging Development
> IBM Corp., Rochester, MN USA                     Bldg 030-2 A410
> dougmill at us.ibm.com               Douglas Miller/Rochester/IBM
>
>
>
>             William Gropp
>             <wgropp at illinois.
>
> edu>                                                       To
>             Sent by:                  "MPI 3.0 Remote Memory Access
>             mpi3-rma-bounces@         working group"
>             lists.mpi-forum.o         <mpi3-rma at lists.mpi-forum.org>
>
> rg                                                         cc
>
>
> Subject
>             06/21/2010 12:05          Re: [Mpi3-rma] Updated MPI-3 RMA
>             AM                        proposal 1
>
>
>             Please respond to
>              "MPI 3.0 Remote
>               Memory Access
>              working group"
>             <mpi3-rma at lists.m
>               pi-forum.org>
>
>
>
>
>
>
> I agree with Rajeev.  And I think we strayed somewhat from the
> original plan.
>
> The goal for the MPI RMA was to make enlarge the set of applications
> that could be efficiently implemented with a one-sided model.  The
> current model *is* a good one for *some* applications; the complaints
> about it are often because it doesn't fit some other application.  The
> RMA extensions for MPI-3, in my mind, needed to address *some*, not
> all, of the important application areas that the current model does
> not handle.  It is interesting to consider whether a reasonable
> functional implementation of, say, UPC, could be implemented with it,
> but I do not see the MPI RMA as supplying the universal implementation
> layer for other programming models.
>
> Making MPI RMA suitable for implementing all other parallel
> programming models will require more than I think we want to do - look
> at the short word and aligned move routines in GASNET as an example.
> You don't need these functionally, but you may need them for
> performance.
>
> That's why we asked for *application* use cases - those would drive
> the design.  We have a few but we haven't focused on them as much as I
> think we should.  What I wanted in proposal 1 was a set of operations,
> each of which (a) was consistent with the others and (b) was clearly
> driven by some *application* need (where an application in this case
> is *not* implementing another programming model).  This very well may
> have required some options to deal with things like selecting
> different ordering and overlapping update semantics, though that could
> be very coarse grained.
>
> Note that in this interpretation, proposal 1 is not a "bare minimum";
> rather, it is a consensus collection of consistent extensions that
> enlarge the space of applications that can be efficiently coded using
> MPI-3 one-sided.  It will leave some useful features out and some
> programming models should focus on interoperability with MPI rather
> than having MPI-3 RMA provide the specific features that they need.
> It is fine if there is something important that can't be done with
> MPI-3 RMA, as long as there are other important things that can be
> done with it.
>
> Bill
>
> On Jun 20, 2010, at 6:03 PM, Rajeev Thakur wrote:
>
>> Are you refering to Accumulate_get :-)? Maybe it should be in
>> Proposal
>> 2.
>>
>> Maybe we also need a "journal of development" as in MPI-2 :-).
>>
>> But, seriously, we need to present a united front at least in
>> proposal
>> 1. Otherwise the Forum will have no confidence in us.
>>
>> Rajeev
>>
>>
>>
>>> -----Original Message-----
>>> From: mpi3-rma-bounces at lists.mpi-forum.org
>>> [mailto:mpi3-rma-bounces at lists.mpi-forum.org] On Behalf Of
>>> Pavan Balaji
>>> Sent: Sunday, June 20, 2010 5:57 PM
>>> To: MPI 3.0 Remote Memory Access working group
>>> Subject: Re: [Mpi3-rma] Updated MPI-3 RMA proposal 1
>>>
>>>
>>> On 06/20/2010 05:48 PM, Rajeev Thakur wrote:
>>>> Proposal 1: This is what the RMA experts agree is the bare minimum
>>>> needed to fix what is considered broken in MPI-2 RMA.
>>>
>>> I don't agree that whatever is there in proposal 1 is the
>>> "bare minimum". Maybe this policy should be reworded as:
>>> *all* members of the working group should agree that this is needed.
>>>
>>> This makes both proposal 1 and proposal 2 contain random
>>> pieces of unrelated features, though.
>>>
>>> -- Pavan
>>>
>>> --
>>> Pavan Balaji
>>> http://www.mcs.anl.gov/~balaji
>>> _______________________________________________
>>> mpi3-rma mailing list
>>> mpi3-rma at lists.mpi-forum.org
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma
>>>
>>
>> _______________________________________________
>> mpi3-rma mailing list
>> mpi3-rma at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma
>
> William Gropp
> Deputy Director for Research
> Institute for Advanced Computing Applications and Technologies
> Paul and Cynthia Saylor Professor of Computer Science
> University of Illinois Urbana-Champaign
>
>
>
>
> _______________________________________________
> mpi3-rma mailing list
> mpi3-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma
>
>
> _______________________________________________
> mpi3-rma mailing list
> mpi3-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma

William Gropp
Deputy Director for Research
Institute for Advanced Computing Applications and Technologies
Paul and Cynthia Saylor Professor of Computer Science
University of Illinois Urbana-Champaign




_______________________________________________
mpi3-rma mailing list
mpi3-rma at lists.mpi-forum.org
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma





More information about the mpiwg-rma mailing list