[Mpi3-rma] RMA proposal 1 update

Jeff Hammond jeff.science at gmail.com
Wed May 19 23:02:33 CDT 2010


ARMCI does not have allflushall.  My goal is not to get ARMCI into
MPI-3 and quit.  My goal is to have something semantically better than
what I could have my junior year in high school.

What is available in GA itself isn't really relevant to the Forum.  We
need the functionality that enables someone to implement GA
~~~efficiently~~~ on current and future platforms.  We know ARMCI is
~~~necessary~~~ to implement GA efficiently on some platforms, but
Vinod and I can provide very important cases where it is ~~~not
sufficient~~~.

In case it isn't 100% clear, my goal for MPI-3 RMA is to make it
possible to implement GA and related models ~~~efficiently~~~ all the
way out to exascale.  This means ~1M nodes wired by some god-forsaken
optical recursive toroidal cubizoid or a network to be determined
later.

The reason I want allfenceall is because a GA sync requires every
process to fence all remote targets.  This is combined with a barrier,
hence it might as well be a collective operation for everyone to fence
all remote targets.  On BGP, implementing GA sync with fenceall from
every node is hideous compared to what I can imagine can be done with
active-message collectives.  I would bet a kidney it is hideous on
Jaguar.  Vinod can sell my kidney in Singapore if I'm wrong.

The argument for allfenceall is the same as for sparse collectives.
If there is an operation which could be done with multiple p2p calls,
but has a collective character, it is guaranteed to be no worse to
allow an MPI runtime to do it collectively.  I know that many
applications will generate a sufficiently dense one-sided
communication matrix to justify allfenceall.

If you reject allfenceall, then I expect, and for intellectual
consistency demand, that you vigorously protest against sparse
collectives when they are proposed on the basis that they can
obviously be done with p2p efficiently already.  Heck, why not also
deprecate all MPI_Bcast etc. since some on some networks it might not
be faster than p2p?

It is really annoying that you are such an obstructionist.  It is
extremely counter-productive to the Forum and I know of no one
deriving intellectual benefit from the endless stream of protests and
demands for OpenSHMEM-like behavior.  As the ability to implement GA
on top of MPI-3 RMA is a stated goal of the working group, I feel no
shame in proposing function calls which are motivated entirely by this
purpose.

Jeff

On Wed, May 19, 2010 at 10:06 PM, Underwood, Keith D
<keith.d.underwood at intel.com> wrote:
> Interestingly, ARMCI is nominally good enough for GA and only appears to have fence() and fenceall() (equivalent to the flush and flushall we proposed).  At least, that was the version I found on the web... What is the ARMCI equivalent of the allflushall that you are proposing?
>
> And, looking at GA, it is enormous, but I only see GA_SYNC (your requested allfenceall) and GA_FENCE (fenceall()).  I don't see a pt2pt sync - which call is that?
>
> Keith
>
>> -----Original Message-----
>> From: mpi3-rma-bounces at lists.mpi-forum.org [mailto:mpi3-rma-
>> bounces at lists.mpi-forum.org] On Behalf Of Jeff Hammond
>> Sent: Wednesday, May 19, 2010 7:31 PM
>> To: MPI 3.0 Remote Memory Access working group
>> Cc: MPI 3.0 Remote Memory Access working group
>> Subject: Re: [Mpi3-rma] RMA proposal 1 update
>>
>> GA has an online manual, API documentation, numerous tutorials and
>> example code, and, finally, journal papers, which cover that nicely
>> already. I'll try to clarify any ambiguity that exists after those
>> sources are considered.
>>
>> A nicer answer is that I will do exactly that as part of something
>> Torsten, myself and others will be working on shortly.
>>
>> :)
>>
>> Jeff
>>
>> Sent from my iPhone
>>
>> On May 19, 2010, at 8:25 PM, "Underwood, Keith D"
>> <keith.d.underwood at intel.com
>>  > wrote:
>>
>> > So, perhaps enumerating the relevant GA constructs and their
>> > semantics would be informative here...
>> >
>> > Keith
>> >
>> >> -----Original Message-----
>> >> From: mpi3-rma-bounces at lists.mpi-forum.org [mailto:mpi3-rma-
>> >> bounces at lists.mpi-forum.org] On Behalf Of Jeff Hammond
>> >> Sent: Wednesday, May 19, 2010 7:23 PM
>> >> To: MPI 3.0 Remote Memory Access working group
>> >> Cc: MPI 3.0 Remote Memory Access working group
>> >> Subject: Re: [Mpi3-rma] RMA proposal 1 update
>> >>
>> >> Can I mix that call with other sync mechanisms?
>> >>
>> >> So I implement GA by calling fence inside of GA_Create to expose the
>> >> window and use fence+barrier for GA_Sync, but can I mix in lock and
>> >> unlock as well as the forthcoming p2p flush (as I can do in GA/ARMCI
>> >> now)?
>> >>
>> >> The standard presents three synchronization schemes. It does not
>> >> suggest one can intermix them at will.
>> >>
>> >> Jeff
>> >>
>> >> Sent from my iPhone
>> >>
>> >> On May 19, 2010, at 2:58 PM, "Underwood, Keith D"
>> >> <keith.d.underwood at intel.com
>> >>> wrote:
>> >>
>> >>> Jeff,
>> >>>
>> >>> Another question for you:  If you are going to call
>> >>> MPI_Win_all_flush_all, why not just use active target and call
>> >>> MPI_Win_fence?
>> >>>
>> >>> Keith
>> >>>
>> >>>> -----Original Message-----
>> >>>> From: mpi3-rma-bounces at lists.mpi-forum.org [mailto:mpi3-rma-
>> >>>> bounces at lists.mpi-forum.org] On Behalf Of Jeff Hammond
>> >>>> Sent: Sunday, May 16, 2010 7:27 PM
>> >>>> To: MPI 3.0 Remote Memory Access working group
>> >>>> Subject: Re: [Mpi3-rma] RMA proposal 1 update
>> >>>>
>> >>>> Tortsten,
>> >>>>
>> >>>> There seemed to be decent agreement on adding
>> MPI_Win_all_flush_all
>> >>>> (equivalent to MPI_Win_flush_all called from every rank in the
>> >>>> communicator associated with the window) since this function can
>> be
>> >>>> implemented far more efficiently as a collective than the
>> >>>> equivalent
>> >>>> point-wise function calls.
>> >>>>
>> >>>> Is there a problem with adding this to your proposal?
>> >>>>
>> >>>> Jeff
>> >>>>
>> >>>> On Sun, May 16, 2010 at 12:48 AM, Torsten Hoefler
>> >> <htor at illinois.edu>
>> >>>> wrote:
>> >>>>> Hello all,
>> >>>>>
>> >>>>> After the discussions at the last Forum I updated the group's
>> >>>>> first
>> >>>>> proposal.
>> >>>>>
>> >>>>> The proposal (one-side-2.pdf) is attached to the wiki page
>> >>>>> https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/RmaWikiPage
>> >>>>>
>> >>>>> The changes with regards to the last version are:
>> >>>>>
>> >>>>> 1) added MPI_NOOP to MPI_Get_accumulate and MPI_Accumulate_get
>> >>>>>
>> >>>>> 2) (re)added MPI_Win_flush and MPI_Win_flush_all to passive
>> target
>> >>>> mode
>> >>>>>
>> >>>>> Some remarks:
>> >>>>>
>> >>>>> 1) We didn't straw-vote on MPI_Accumulate_get, so this function
>> >>>>> might
>> >>>>>  go. The removal would be very clean.
>> >>>>>
>> >>>>> 2) Should we allow MPI_NOOP in MPI_Accumulate (this does not make
>> >>>> sense
>> >>>>>  and is incorrect in my current proposal)
>> >>>>>
>> >>>>> 3) Should we allow MPI_REPLACE in
>> >>>> MPI_Get_accumulate/MPI_Accumulate_get?
>> >>>>>  (this would make sense and is allowed in the current proposal
>> but
>> >>>> we
>> >>>>>  didn't talk about it in the group)
>> >>>>>
>> >>>>>
>> >>>>> All the Best,
>> >>>>> Torsten
>> >>>>>
>> >>>>> --
>> >>>>> bash$ :(){ :|:&};: --------------------- http://www.unixer.de/
>> >>>>> -----
>> >>>>> Torsten Hoefler         | Research Associate
>> >>>>> Blue Waters Directorate | University of Illinois
>> >>>>> 1205 W Clark Street     | Urbana, IL, 61801
>> >>>>> NCSA Building           | +01 (217) 244-7736
>> >>>>> _______________________________________________
>> >>>>> mpi3-rma mailing list
>> >>>>> mpi3-rma at lists.mpi-forum.org
>> >>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma
>> >>>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Jeff Hammond
>> >>>> Argonne Leadership Computing Facility
>> >>>> jhammond at mcs.anl.gov / (630) 252-5381
>> >>>> http://www.linkedin.com/in/jeffhammond
>> >>>>
>> >>>> _______________________________________________
>> >>>> mpi3-rma mailing list
>> >>>> mpi3-rma at lists.mpi-forum.org
>> >>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma
>> >>>
>> >>> _______________________________________________
>> >>> mpi3-rma mailing list
>> >>> mpi3-rma at lists.mpi-forum.org
>> >>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma
>> >> _______________________________________________
>> >> mpi3-rma mailing list
>> >> mpi3-rma at lists.mpi-forum.org
>> >> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma
>> >
>> > _______________________________________________
>> > mpi3-rma mailing list
>> > mpi3-rma at lists.mpi-forum.org
>> > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma
>> _______________________________________________
>> mpi3-rma mailing list
>> mpi3-rma at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma
>
> _______________________________________________
> mpi3-rma mailing list
> mpi3-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma
>



-- 
Jeff Hammond
Argonne Leadership Computing Facility
jhammond at mcs.anl.gov / (630) 252-5381
http://www.linkedin.com/in/jeffhammond




More information about the mpiwg-rma mailing list