[mpiwg-rma] MPI RMA status summary

Rolf Rabenseifner rabenseifner at hlrs.de
Tue Sep 30 09:05:27 CDT 2014


I strongly agree with your statement:
> These piecemeal changes are one of the sources of our problems.

I only wanted to strongly say, that I would never vote for your a),
because it is not backward compatible to what is already used.
And with b) I've the problem that b1) is clear to me (see #456),
but the Win_flush semantics for load/store is unclear to me.

Of course, a total solution is needed and not parts of it.
#456 is such a trial for a complete solution. 

Rolf

----- Original Message -----
> From: "William Gropp" <wgropp at illinois.edu>
> To: "MPI WG Remote Memory Access working group" <mpiwg-rma at lists.mpi-forum.org>
> Sent: Tuesday, September 30, 2014 3:19:06 PM
> Subject: Re: [mpiwg-rma] MPI RMA status summary
> 
> I disagree with this approach.  The most important thing to do is to
> figure out the correct definitions and semantics.  Once we agree on
> that, we can determine what can be handled as an errata and what
> will require an update to the chapter and an update to the MPI
> standard.  These piecemeal changes are one of the sources of our
> problems.
> 
> Bill
> 
> On Sep 30, 2014, at 7:38 AM, Rolf Rabenseifner <rabenseifner at hlrs.de>
> wrote:
> 
> >> Vote for 1 of the following:
> >> 
> >> a) Only Win_sync provides memory barrier semantics to shared
> >> memory
> >> windows
> >> b) All RMA completion/sync routines (e.g., MPI_Win_lock,
> >> MPI_Win_fence, MPI_Win_flush) provide memory barrier semantics
> >> c) Some as yet undetermined blend of a and b, which might include
> >> additional asserts
> >> d) This topic needs further discussion
> > 
> > Because we only have to clarify MPI-3.0 (this is an errata issue)
> > and
> > - obviously the MPI Forum and the readers expected that
> > MPI_Win_fence
> >  (and therefore also the other MPI-2 synchronizations
> >  MPI_Win_post/start/complete/wait and MPI_Win_lock/unlock)
> >  works if MPI_Get/Put are sustituted by shared memory load/store
> >  (see the many Forum members as authors of the EuroMPI paper)
> > - and the Forum decided that also MPI_Win_sync acts as if
> >  a memory barrier is inside,
> > for me,
> > - a) cannot be chosen bacause an erratum cannot remove
> >     a given functionality
> > - and b) is automatically given, see reasons above. Therefore #456.
> > 
> > The only open question for me is about the meaning of
> > MPI_Win_flush.
> > Therefore MPI_Win_flush is still missing in #456.
> > 
> > Therefore for me, the Major choices seems to be
> > b1) MPI-2 synchronizations + MPI_Win_sync
> > b2) MPI-2 synchronizations + MPI_Win_sync + MPI_Win_flush
> > 
> > For this vote, I clearly want to see a clear proposal
> > about the meaning of MPI_Win_flush together with
> > sared memory load/store, hopefully with the notation
> > used in #456.
> > 
> > Best regards
> > Rolf
> > 
> > ----- Original Message -----
> >> From: "William Gropp" <wgropp at illinois.edu>
> >> To: "MPI WG Remote Memory Access working group"
> >> <mpiwg-rma at lists.mpi-forum.org>
> >> Sent: Monday, September 29, 2014 11:39:51 PM
> >> Subject: Re: [mpiwg-rma] MPI RMA status summary
> >> 
> >> 
> >> Thanks, Jeff.
> >> 
> >> 
> >> I agree that I don’t want load/store to be considered RMA
> >> operations.
> >>  But the issue of the memory consistency on RMA synchronization
> >>  and
> >> completion operations to a shared memory window is complex.  In
> >> some
> >> ways, the most consistent with RMA in other situations is the case
> >> of MPI_Win_lock to your own process; the easiest extension for the
> >> user is to have reasonably strong memory barrier semantics at all
> >> sync/completion operations (thus including Fence).  As you note,
> >> this has costs.  At the other extreme, we could say that only
> >> Win_sync provides these memory barrier semantics.  And we could
> >> pick
> >> a more complex blend (yes for some, no for others).
> >> 
> >> 
> >> One of the first questions is whether we want to only Win_sync,
> >> all
> >> completion/sync RMA routines, or some subset to provide memory
> >> barrier semantics for shared memory windows (this would include
> >> RMA
> >> windows that claimed to be shared memory, since there is a
> >> proposal
> >> to extend that property to other RMA windows).  It would be good
> >> to
> >> make progress on this question, so I propose a straw vote of this
> >> group by email.  Vote for 1 of the following:
> >> 
> >> 
> >> a) Only Win_sync provides memory barrier semantics to shared
> >> memory
> >> windows
> >> b) All RMA completion/sync routines (e.g., MPI_Win_lock,
> >> MPI_Win_fence, MPI_Win_flush) provide memory barrier semantics
> >> c) Some as yet undetermined blend of a and b, which might include
> >> additional asserts
> >> d) This topic needs further discussion
> >> 
> >> 
> >> Note that I’ve left off what “memory barrier semantics” means.
> >>  That
> >> will need to be precisely defined for the standard, but I believe
> >> we
> >> know what we intend for this.  We specifically are not defining
> >> what
> >> happens with non-MPI code.  Also note that this is separate from
> >> whether the RMA sync routines appear to be blocking when applied
> >> to
> >> a shared memory window; we can do a separate straw vote on that.
> >> 
> >> 
> >> Bill
> >> 
> >> 
> >> 
> >> On Sep 29, 2014, at 3:49 PM, Jeff Hammond < jeff.science at gmail.com
> >> >
> >> wrote:
> >> 
> >> 
> >> 
> >> On Mon, Sep 29, 2014 at 9:16 AM, Rolf Rabenseifner <
> >> rabenseifner at hlrs.de > wrote:
> >> 
> >> 
> >> Only about the issues on #456 (shared memory syncronization):
> >> 
> >> 
> >> 
> >> For the ones requiring discussion, assign someone to organize a
> >> position and discussion.  We can schedule telecons to go over
> >> those
> >> issues.  The first item in the list is certainly in this class.
> >> 
> >> Who can organize telecons on #456.
> >> Would it be possible to organize a RMA meeting at SC?
> >> 
> >> I will be there Monday through part of Thursday but am usually
> >> triple-booked from 8 AM to midnight.
> >> 
> >> 
> >> 
> >> The position expressed by the solution #456 is based on the idea
> >> that the MPI RMA synchronization routines should have the same
> >> outcome when RMA PUT and GET calls are substituted by stores and
> >> loads.
> >> 
> >> The outcome for the flush routines is still not defined.
> >> 
> >> It is interesting, because the standard is actually conflicting on
> >> whether Flush affects load-store.  I find this incredibly
> >> frustrating.
> >> 
> >> Page 450:
> >> 
> >> "Locally completes at the origin all outstanding RMA operations
> >> initiated by the calling process to the target process specified
> >> by
> >> rank on the specified window. For example, after this routine
> >> completes, the user may reuse any buffers provided to put, get, or
> >> accumulate operations."
> >> 
> >> I do not not think "RMA operations" includes load-store.
> >> 
> >> Page 410:
> >> 
> >> "The consistency of load/store accesses from/to the shared memory
> >> as
> >> observed by the user program depends on the architecture. A
> >> consistent
> >> view can be created in the unified memory model (see Section 11.4)
> >> by
> >> utilizing the window synchronization functions (see Section 11.5)
> >> or
> >> explicitly completing outstanding store accesses (e.g., by calling
> >> MPI_WIN_FLUSH)."
> >> 
> >> Here it is unambiguously implied that MPI_WIN_FLUSH affects
> >> load-stores.
> >> 
> >> My preference is to fix the statement on 410 since it is less
> >> canonical than the one on 450, and because I do not want to have a
> >> memory barrier in every call to WIN_FLUSH.
> >> 
> >> Jeff
> >> 
> >> 
> >> 
> >> I would prefere to have an organizer of the discussion inside of
> >> the RMA subgroup that proposed the changes for MPI-3.1
> >> rather that I'm the organizer.
> >> I tried to bring all the input together and hope that #456
> >> is now state that it is consistent itsself and with the
> >> expectations expressed by the group that published the
> >> paper at EuroMPI on first usage of this shared memory interface.
> >> 
> >> The ticket is (together with the help of recent C11
> >> standadization)
> >> on a good way to be also consistent with the compiler
> >> optimizations -
> >> in other words - the C standardization body has learnt from the
> >> pthreads problems. Fortran is still an open question to me,
> >> i.e., I do not know the status, see
> >> https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/456#comment:13
> >> 
> >> Best regards
> >> Rolf
> >> 
> >> 
> >> 
> >> ----- Original Message -----
> >> 
> >> 
> >> From: "William Gropp" <wgropp at illinois.edu>
> >> To: "MPI WG Remote Memory Access working group"
> >> <mpiwg-rma at lists.mpi-forum.org>
> >> Sent: Thursday, September 25, 2014 4:19:14 PM
> >> Subject: [mpiwg-rma] MPI RMA status summary
> >> 
> >> I looked through all of the tickets and wrote a summary of the
> >> open
> >> issues, which I’ve attached.  I propose the following:
> >> 
> >> Determine which of these issues can be resolved by email.  A
> >> significant number can probably be closed with no further action.
> >> 
> >> For those requiring rework, determine if there is still interest
> >> in
> >> them, and if not, close them as well.
> >> 
> >> For the ones requiring discussion, assign someone to organize a
> >> position and discussion.  We can schedule telecons to go over
> >> those
> >> issues.  The first item in the list is certainly in this class.
> >> 
> >> Comments?
> >> 
> >> Bill
> >> 
> >> _______________________________________________
> >> mpiwg-rma mailing list
> >> mpiwg-rma at lists.mpi-forum.org
> >> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> >> 
> >> --
> >> Dr. Rolf Rabenseifner . . . . . . . . . .. email
> >> rabenseifner at hlrs.de
> >> High Performance Computing Center (HLRS) . phone
> >> ++49(0)711/685-65530
> >> University of Stuttgart . . . . . . . . .. fax ++49(0)711 /
> >> 685-65832
> >> Head of Dpmt Parallel Computing . . .
> >> www.hlrs.de/people/rabenseifner
> >> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room
> >> 1.307)
> >> _______________________________________________
> >> mpiwg-rma mailing list
> >> mpiwg-rma at lists.mpi-forum.org
> >> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> >> 
> >> 
> >> 
> >> --
> >> Jeff Hammond
> >> jeff.science at gmail.com
> >> http://jeffhammond.github.io/
> >> _______________________________________________
> >> mpiwg-rma mailing list
> >> mpiwg-rma at lists.mpi-forum.org
> >> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> >> 
> >> _______________________________________________
> >> mpiwg-rma mailing list
> >> mpiwg-rma at lists.mpi-forum.org
> >> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> > 
> > --
> > Dr. Rolf Rabenseifner . . . . . . . . . .. email
> > rabenseifner at hlrs.de
> > High Performance Computing Center (HLRS) . phone
> > ++49(0)711/685-65530
> > University of Stuttgart . . . . . . . . .. fax ++49(0)711 /
> > 685-65832
> > Head of Dpmt Parallel Computing . . .
> > www.hlrs.de/people/rabenseifner
> > Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room
> > 1.307)
> > _______________________________________________
> > mpiwg-rma mailing list
> > mpiwg-rma at lists.mpi-forum.org
> > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> 
> _______________________________________________
> mpiwg-rma mailing list
> mpiwg-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> 

-- 
Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307)



More information about the mpiwg-rma mailing list