[mpiwg-rma] MPI RMA status summary

Jeff Hammond jeff.science at gmail.com
Tue Sep 30 09:09:02 CDT 2014


Option A is what the standard says today outside of a sloppy offhand remark in parentheses. See my note please. 

Jeff

Sent from my iPhone

> On Sep 30, 2014, at 7:05 AM, Rolf Rabenseifner <rabenseifner at hlrs.de> wrote:
> 
> I strongly agree with your statement:
>> These piecemeal changes are one of the sources of our problems.
> 
> I only wanted to strongly say, that I would never vote for your a),
> because it is not backward compatible to what is already used.
> And with b) I've the problem that b1) is clear to me (see #456),
> but the Win_flush semantics for load/store is unclear to me.
> 
> Of course, a total solution is needed and not parts of it.
> #456 is such a trial for a complete solution. 
> 
> Rolf
> 
> ----- Original Message -----
>> From: "William Gropp" <wgropp at illinois.edu>
>> To: "MPI WG Remote Memory Access working group" <mpiwg-rma at lists.mpi-forum.org>
>> Sent: Tuesday, September 30, 2014 3:19:06 PM
>> Subject: Re: [mpiwg-rma] MPI RMA status summary
>> 
>> I disagree with this approach.  The most important thing to do is to
>> figure out the correct definitions and semantics.  Once we agree on
>> that, we can determine what can be handled as an errata and what
>> will require an update to the chapter and an update to the MPI
>> standard.  These piecemeal changes are one of the sources of our
>> problems.
>> 
>> Bill
>> 
>> On Sep 30, 2014, at 7:38 AM, Rolf Rabenseifner <rabenseifner at hlrs.de>
>> wrote:
>> 
>>>> Vote for 1 of the following:
>>>> 
>>>> a) Only Win_sync provides memory barrier semantics to shared
>>>> memory
>>>> windows
>>>> b) All RMA completion/sync routines (e.g., MPI_Win_lock,
>>>> MPI_Win_fence, MPI_Win_flush) provide memory barrier semantics
>>>> c) Some as yet undetermined blend of a and b, which might include
>>>> additional asserts
>>>> d) This topic needs further discussion
>>> 
>>> Because we only have to clarify MPI-3.0 (this is an errata issue)
>>> and
>>> - obviously the MPI Forum and the readers expected that
>>> MPI_Win_fence
>>> (and therefore also the other MPI-2 synchronizations
>>> MPI_Win_post/start/complete/wait and MPI_Win_lock/unlock)
>>> works if MPI_Get/Put are sustituted by shared memory load/store
>>> (see the many Forum members as authors of the EuroMPI paper)
>>> - and the Forum decided that also MPI_Win_sync acts as if
>>> a memory barrier is inside,
>>> for me,
>>> - a) cannot be chosen bacause an erratum cannot remove
>>>    a given functionality
>>> - and b) is automatically given, see reasons above. Therefore #456.
>>> 
>>> The only open question for me is about the meaning of
>>> MPI_Win_flush.
>>> Therefore MPI_Win_flush is still missing in #456.
>>> 
>>> Therefore for me, the Major choices seems to be
>>> b1) MPI-2 synchronizations + MPI_Win_sync
>>> b2) MPI-2 synchronizations + MPI_Win_sync + MPI_Win_flush
>>> 
>>> For this vote, I clearly want to see a clear proposal
>>> about the meaning of MPI_Win_flush together with
>>> sared memory load/store, hopefully with the notation
>>> used in #456.
>>> 
>>> Best regards
>>> Rolf
>>> 
>>> ----- Original Message -----
>>>> From: "William Gropp" <wgropp at illinois.edu>
>>>> To: "MPI WG Remote Memory Access working group"
>>>> <mpiwg-rma at lists.mpi-forum.org>
>>>> Sent: Monday, September 29, 2014 11:39:51 PM
>>>> Subject: Re: [mpiwg-rma] MPI RMA status summary
>>>> 
>>>> 
>>>> Thanks, Jeff.
>>>> 
>>>> 
>>>> I agree that I don’t want load/store to be considered RMA
>>>> operations.
>>>> But the issue of the memory consistency on RMA synchronization
>>>> and
>>>> completion operations to a shared memory window is complex.  In
>>>> some
>>>> ways, the most consistent with RMA in other situations is the case
>>>> of MPI_Win_lock to your own process; the easiest extension for the
>>>> user is to have reasonably strong memory barrier semantics at all
>>>> sync/completion operations (thus including Fence).  As you note,
>>>> this has costs.  At the other extreme, we could say that only
>>>> Win_sync provides these memory barrier semantics.  And we could
>>>> pick
>>>> a more complex blend (yes for some, no for others).
>>>> 
>>>> 
>>>> One of the first questions is whether we want to only Win_sync,
>>>> all
>>>> completion/sync RMA routines, or some subset to provide memory
>>>> barrier semantics for shared memory windows (this would include
>>>> RMA
>>>> windows that claimed to be shared memory, since there is a
>>>> proposal
>>>> to extend that property to other RMA windows).  It would be good
>>>> to
>>>> make progress on this question, so I propose a straw vote of this
>>>> group by email.  Vote for 1 of the following:
>>>> 
>>>> 
>>>> a) Only Win_sync provides memory barrier semantics to shared
>>>> memory
>>>> windows
>>>> b) All RMA completion/sync routines (e.g., MPI_Win_lock,
>>>> MPI_Win_fence, MPI_Win_flush) provide memory barrier semantics
>>>> c) Some as yet undetermined blend of a and b, which might include
>>>> additional asserts
>>>> d) This topic needs further discussion
>>>> 
>>>> 
>>>> Note that I’ve left off what “memory barrier semantics” means.
>>>> That
>>>> will need to be precisely defined for the standard, but I believe
>>>> we
>>>> know what we intend for this.  We specifically are not defining
>>>> what
>>>> happens with non-MPI code.  Also note that this is separate from
>>>> whether the RMA sync routines appear to be blocking when applied
>>>> to
>>>> a shared memory window; we can do a separate straw vote on that.
>>>> 
>>>> 
>>>> Bill
>>>> 
>>>> 
>>>> 
>>>> On Sep 29, 2014, at 3:49 PM, Jeff Hammond < jeff.science at gmail.com
>>>> wrote:
>>>> 
>>>> 
>>>> 
>>>> On Mon, Sep 29, 2014 at 9:16 AM, Rolf Rabenseifner <
>>>> rabenseifner at hlrs.de > wrote:
>>>> 
>>>> 
>>>> Only about the issues on #456 (shared memory syncronization):
>>>> 
>>>> 
>>>> 
>>>> For the ones requiring discussion, assign someone to organize a
>>>> position and discussion.  We can schedule telecons to go over
>>>> those
>>>> issues.  The first item in the list is certainly in this class.
>>>> 
>>>> Who can organize telecons on #456.
>>>> Would it be possible to organize a RMA meeting at SC?
>>>> 
>>>> I will be there Monday through part of Thursday but am usually
>>>> triple-booked from 8 AM to midnight.
>>>> 
>>>> 
>>>> 
>>>> The position expressed by the solution #456 is based on the idea
>>>> that the MPI RMA synchronization routines should have the same
>>>> outcome when RMA PUT and GET calls are substituted by stores and
>>>> loads.
>>>> 
>>>> The outcome for the flush routines is still not defined.
>>>> 
>>>> It is interesting, because the standard is actually conflicting on
>>>> whether Flush affects load-store.  I find this incredibly
>>>> frustrating.
>>>> 
>>>> Page 450:
>>>> 
>>>> "Locally completes at the origin all outstanding RMA operations
>>>> initiated by the calling process to the target process specified
>>>> by
>>>> rank on the specified window. For example, after this routine
>>>> completes, the user may reuse any buffers provided to put, get, or
>>>> accumulate operations."
>>>> 
>>>> I do not not think "RMA operations" includes load-store.
>>>> 
>>>> Page 410:
>>>> 
>>>> "The consistency of load/store accesses from/to the shared memory
>>>> as
>>>> observed by the user program depends on the architecture. A
>>>> consistent
>>>> view can be created in the unified memory model (see Section 11.4)
>>>> by
>>>> utilizing the window synchronization functions (see Section 11.5)
>>>> or
>>>> explicitly completing outstanding store accesses (e.g., by calling
>>>> MPI_WIN_FLUSH)."
>>>> 
>>>> Here it is unambiguously implied that MPI_WIN_FLUSH affects
>>>> load-stores.
>>>> 
>>>> My preference is to fix the statement on 410 since it is less
>>>> canonical than the one on 450, and because I do not want to have a
>>>> memory barrier in every call to WIN_FLUSH.
>>>> 
>>>> Jeff
>>>> 
>>>> 
>>>> 
>>>> I would prefere to have an organizer of the discussion inside of
>>>> the RMA subgroup that proposed the changes for MPI-3.1
>>>> rather that I'm the organizer.
>>>> I tried to bring all the input together and hope that #456
>>>> is now state that it is consistent itsself and with the
>>>> expectations expressed by the group that published the
>>>> paper at EuroMPI on first usage of this shared memory interface.
>>>> 
>>>> The ticket is (together with the help of recent C11
>>>> standadization)
>>>> on a good way to be also consistent with the compiler
>>>> optimizations -
>>>> in other words - the C standardization body has learnt from the
>>>> pthreads problems. Fortran is still an open question to me,
>>>> i.e., I do not know the status, see
>>>> https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/456#comment:13
>>>> 
>>>> Best regards
>>>> Rolf
>>>> 
>>>> 
>>>> 
>>>> ----- Original Message -----
>>>> 
>>>> 
>>>> From: "William Gropp" <wgropp at illinois.edu>
>>>> To: "MPI WG Remote Memory Access working group"
>>>> <mpiwg-rma at lists.mpi-forum.org>
>>>> Sent: Thursday, September 25, 2014 4:19:14 PM
>>>> Subject: [mpiwg-rma] MPI RMA status summary
>>>> 
>>>> I looked through all of the tickets and wrote a summary of the
>>>> open
>>>> issues, which I’ve attached.  I propose the following:
>>>> 
>>>> Determine which of these issues can be resolved by email.  A
>>>> significant number can probably be closed with no further action.
>>>> 
>>>> For those requiring rework, determine if there is still interest
>>>> in
>>>> them, and if not, close them as well.
>>>> 
>>>> For the ones requiring discussion, assign someone to organize a
>>>> position and discussion.  We can schedule telecons to go over
>>>> those
>>>> issues.  The first item in the list is certainly in this class.
>>>> 
>>>> Comments?
>>>> 
>>>> Bill
>>>> 
>>>> _______________________________________________
>>>> mpiwg-rma mailing list
>>>> mpiwg-rma at lists.mpi-forum.org
>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>>> 
>>>> --
>>>> Dr. Rolf Rabenseifner . . . . . . . . . .. email
>>>> rabenseifner at hlrs.de
>>>> High Performance Computing Center (HLRS) . phone
>>>> ++49(0)711/685-65530
>>>> University of Stuttgart . . . . . . . . .. fax ++49(0)711 /
>>>> 685-65832
>>>> Head of Dpmt Parallel Computing . . .
>>>> www.hlrs.de/people/rabenseifner
>>>> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room
>>>> 1.307)
>>>> _______________________________________________
>>>> mpiwg-rma mailing list
>>>> mpiwg-rma at lists.mpi-forum.org
>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Jeff Hammond
>>>> jeff.science at gmail.com
>>>> http://jeffhammond.github.io/
>>>> _______________________________________________
>>>> mpiwg-rma mailing list
>>>> mpiwg-rma at lists.mpi-forum.org
>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>>> 
>>>> _______________________________________________
>>>> mpiwg-rma mailing list
>>>> mpiwg-rma at lists.mpi-forum.org
>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>> 
>>> --
>>> Dr. Rolf Rabenseifner . . . . . . . . . .. email
>>> rabenseifner at hlrs.de
>>> High Performance Computing Center (HLRS) . phone
>>> ++49(0)711/685-65530
>>> University of Stuttgart . . . . . . . . .. fax ++49(0)711 /
>>> 685-65832
>>> Head of Dpmt Parallel Computing . . .
>>> www.hlrs.de/people/rabenseifner
>>> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room
>>> 1.307)
>>> _______________________________________________
>>> mpiwg-rma mailing list
>>> mpiwg-rma at lists.mpi-forum.org
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>> 
>> _______________________________________________
>> mpiwg-rma mailing list
>> mpiwg-rma at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> 
> -- 
> Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
> High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
> University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
> Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307)
> _______________________________________________
> mpiwg-rma mailing list
> mpiwg-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma



More information about the mpiwg-rma mailing list