<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">Thanks, Jeff.<div><br></div><div>I agree that I don’t want load/store to be considered RMA operations.  But the issue of the memory consistency on RMA synchronization and completion operations to a shared memory window is complex.  In some ways, the most consistent with RMA in other situations is the case of MPI_Win_lock to your own process; the easiest extension for the user is to have reasonably strong memory barrier semantics at all sync/completion operations (thus including Fence).  As you note, this has costs.  At the other extreme, we could say that only Win_sync provides these memory barrier semantics.  And we could pick a more complex blend (yes for some, no for others).</div><div><br></div><div>One of the first questions is whether we want to only Win_sync, all completion/sync RMA routines, or some subset to provide memory barrier semantics for shared memory windows (this would include RMA windows that claimed to be shared memory, since there is a proposal to extend that property to other RMA windows).  It would be good to make progress on this question, so I propose a straw vote of this group by email.  Vote for 1 of the following:</div><div><br></div><div>a) Only Win_sync provides memory barrier semantics to shared memory windows</div><div>b) All RMA completion/sync routines (e.g., MPI_Win_lock, MPI_Win_fence, MPI_Win_flush) provide memory barrier semantics</div><div>c) Some as yet undetermined blend of a and b, which might include additional asserts</div><div>d) This topic needs further discussion</div><div><br></div><div>Note that I’ve left off what “memory barrier semantics” means.  That will need to be precisely defined for the standard, but I believe we know what we intend for this.  We specifically are not defining what happens with non-MPI code.  Also note that this is separate from whether the RMA sync routines appear to be blocking when applied to a shared memory window; we can do a separate straw vote on that.</div><div><br></div><div>Bill</div><div><br><div><div>On Sep 29, 2014, at 3:49 PM, Jeff Hammond <<a href="mailto:jeff.science@gmail.com">jeff.science@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div style="font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">On Mon, Sep 29, 2014 at 9:16 AM, Rolf Rabenseifner <<a href="mailto:rabenseifner@hlrs.de">rabenseifner@hlrs.de</a>> wrote:<br><blockquote type="cite">Only about the issues on #456 (shared memory syncronization):<br><br><blockquote type="cite">For the ones requiring discussion, assign someone to organize a<br>position and discussion.  We can schedule telecons to go over those<br>issues.  The first item in the list is certainly in this class.<br></blockquote><br>Who can organize telecons on #456.<br>Would it be possible to organize a RMA meeting at SC?<br></blockquote><br>I will be there Monday through part of Thursday but am usually<br>triple-booked from 8 AM to midnight.<br><br><blockquote type="cite">The position expressed by the solution #456 is based on the idea<br>that the MPI RMA synchronization routines should have the same<br>outcome when RMA PUT and GET calls are substituted by stores and loads.<br><br>The outcome for the flush routines is still not defined.<br></blockquote><br>It is interesting, because the standard is actually conflicting on<br>whether Flush affects load-store.  I find this incredibly frustrating.<br><br>Page 450:<br><br>"Locally completes at the origin all outstanding RMA operations<br>initiated by the calling process to the target process specified by<br>rank on the specified window. For example, after this routine<br>completes, the user may reuse any buffers provided to put, get, or<br>accumulate operations."<br><br>I do not not think "RMA operations" includes load-store.<br><br>Page 410:<br><br>"The consistency of load/store accesses from/to the shared memory as<br>observed by the user program depends on the architecture. A consistent<br>view can be created in the unified memory model (see Section 11.4) by<br>utilizing the window synchronization functions (see Section 11.5) or<br>explicitly completing outstanding store accesses (e.g., by calling<br>MPI_WIN_FLUSH)."<br><br>Here it is unambiguously implied that MPI_WIN_FLUSH affects load-stores.<br><br>My preference is to fix the statement on 410 since it is less<br>canonical than the one on 450, and because I do not want to have a<br>memory barrier in every call to WIN_FLUSH.<br><br>Jeff<br><br><blockquote type="cite">I would prefere to have an organizer of the discussion inside of<br>the RMA subgroup that proposed the changes for MPI-3.1<br>rather that I'm the organizer.<br>I tried to bring all the input together and hope that #456<br>is now state that it is consistent itsself and with the<br>expectations expressed by the group that published the<br>paper at EuroMPI on first usage of this shared memory interface.<br><br>The ticket is (together with the help of recent C11 standadization)<br>on a good way to be also consistent with the compiler optimizations -<br>in other words - the C standardization body has learnt from the<br>pthreads problems. Fortran is still an open question to me,<br>i.e., I do not know the status, see<br><a href="https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/456#comment:13">https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/456#comment:13</a><br><br>Best regards<br>Rolf<br><br><br><br>----- Original Message -----<br><blockquote type="cite">From: "William Gropp" <wgropp@illinois.edu><br>To: "MPI WG Remote Memory Access working group" <mpiwg-rma@lists.mpi-forum.org><br>Sent: Thursday, September 25, 2014 4:19:14 PM<br>Subject: [mpiwg-rma] MPI RMA status summary<br><br>I looked through all of the tickets and wrote a summary of the open<br>issues, which I’ve attached.  I propose the following:<br><br>Determine which of these issues can be resolved by email.  A<br>significant number can probably be closed with no further action.<br><br>For those requiring rework, determine if there is still interest in<br>them, and if not, close them as well.<br><br>For the ones requiring discussion, assign someone to organize a<br>position and discussion.  We can schedule telecons to go over those<br>issues.  The first item in the list is certainly in this class.<br><br>Comments?<br><br>Bill<br><br>_______________________________________________<br>mpiwg-rma mailing list<br>mpiwg-rma@lists.mpi-forum.org<br>http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma<br></blockquote><br>--<br>Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner@hlrs.de<br>High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530<br>University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832<br>Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner<br>Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307)<br>_______________________________________________<br>mpiwg-rma mailing list<br>mpiwg-rma@lists.mpi-forum.org<br>http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma<br></blockquote><br><br><br>--<span class="Apple-converted-space"> </span><br>Jeff Hammond<br><a href="mailto:jeff.science@gmail.com">jeff.science@gmail.com</a><br><a href="http://jeffhammond.github.io/">http://jeffhammond.github.io/</a><br>_______________________________________________<br>mpiwg-rma mailing list<br><a href="mailto:mpiwg-rma@lists.mpi-forum.org">mpiwg-rma@lists.mpi-forum.org</a><br><a href="http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma">http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma</a></div></blockquote></div><br></div></body></html>