<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">I’ve written 7-pt stencil code with RMA and it does not require this many RMA synchronizations.<div><br></div><div>Bill</div><div><br></div><div><br><div apple-content-edited="true">
<span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; "><span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; "><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; "><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div style="font-size: 12px; ">William Gropp</div><div style="font-size: 12px; ">Director, Parallel Computing Institute</div></div></span><span class="Apple-style-span" style="font-size: 12px; ">Thomas M. Siebel Chair in Computer Science</span><span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; "><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div><div style="font-size: 12px; ">University of Illinois Urbana-Champaign</div></div><div><br></div></div></span><br class="Apple-interchange-newline"></div></span><br class="Apple-interchange-newline"></span><br class="Apple-interchange-newline">
</div>
<br><div><div>On Aug 11, 2014, at 10:47 AM, Rolf Rabenseifner <<a href="mailto:rabenseifner@hlrs.de">rabenseifner@hlrs.de</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">It is not function call overhead, it is my 7 point-stencil-example<br>executing 6 times a synchronization pattern (e.g. all what is needed<br>for MPI_Win_fence) or only one time this synchronization pattern.<br>And as far as I see, the multiple synchronizations<br>cannot be discarded by the MPI library based on some<br>intelligent optimization.<br><br>More than this, one RMA epoch needs typically two synchroniizations,<br>i.e., the question is whether 12 or two MPI_Win_fence are needed;<br>i.e., difference is 10, i.e., my proposal would allow<br>that 10 times the MPI_Win_fence latency is removed.<br><br>Rolf <br><br>----- Original Message -----<br><blockquote type="cite">From: "Jeff Hammond" <<a href="mailto:jeff.science@gmail.com">jeff.science@gmail.com</a>><br>To: "MPI WG Remote Memory Access working group" <<a href="mailto:mpiwg-rma@lists.mpi-forum.org">mpiwg-rma@lists.mpi-forum.org</a>><br>Sent: Monday, August 11, 2014 4:09:19 PM<br>Subject: Re: [mpiwg-rma] Single RMA synchronization for several<span class="Apple-tab-span" style="white-space:pre"> </span>window<span class="Apple-tab-span" style="white-space:pre"> </span>handles<br><br>If function call overhead matters, MPI is probably the wrong model.<br><br>You'll have to prove to me that this (function call overhead) is<br>significant compared to the cost of synchronization using empirical<br>data on some system.<br><br>Jeff<br><br>Sent from my iPhone<br><br><blockquote type="cite">On Aug 11, 2014, at 1:49 AM, Rolf Rabenseifner<br><<a href="mailto:rabenseifner@hlrs.de">rabenseifner@hlrs.de</a>> wrote:<br><br>Jim and all,<br><br>It is not syntactic sugar.<br>It is a latency optimizing enhancement:<br><br>If you are doing neighbor communication with 1-sided<br>communication to 6 neighbors - based on 6 windows that<br>are all defined in MPI_COMM_WORLD, then currently<br>you need to call 6 times the synchronization calls<br>which implies at maximum a 6 times larger latency,<br>e.g., 6 times MPI_Win_fence instead of one call.<br><br>If you are using MPI on shared memory windows,<br>then the same latency problem exists, but here,<br>the trick with dynamic windows is impossible<br>because shared memory windows must be allocated<br>with MPI_Win_allocate_shared.<br><br>Of course, it is more complicated to define<br>MPI_Win_combine on a superset of communicators used<br>for all the combined windows.<br><br>As a first effort, it would be already helpful<br>to define it for the same communicator.<br>To allow enhancements in a future MPI version,<br>I still would recommend to have the comm argument<br>as part of the argument list.<br><br>Rolf<br><br>----- Original Message -----<br><blockquote type="cite">From: "Jim Dinan" <<a href="mailto:james.dinan@gmail.com">james.dinan@gmail.com</a>><br>To: "MPI WG Remote Memory Access working group"<br><<a href="mailto:mpiwg-rma@lists.mpi-forum.org">mpiwg-rma@lists.mpi-forum.org</a>><br>Sent: Sunday, August 10, 2014 5:48:02 PM<br>Subject: Re: [mpiwg-rma] Single RMA synchronization for several<br>window handles<br><br><br><br>Hi Rolf,<br><br><br>I had initially proposed this in the context of passive target<br>RMA.<br>Active target RMA already notifies the receiver when data has<br>arrived, but there is no efficient way to get such a notification<br>in<br>passive target RMA. I think what you are proposing here would be<br>syntactic sugar on top of the existing interface --- could I<br>implement this by fencing every window to determine that all<br>transfers are completed?<br><br><br>I agree with the comments regarding dynamic windows. The merged<br>window would contain discontiguous buffers; thus, it would lose<br>the<br>ability to do offset-based addressing and would need to use<br>absolute<br>(BOTTOM-based) addressing.<br><br><br>~Jim.<br><br><br><br>On Fri, Aug 8, 2014 at 10:18 AM, Rolf Rabenseifner <<br><a href="mailto:rabenseifner@hlrs.de">rabenseifner@hlrs.de</a> > wrote:<br><br><br>Jim,<br><br>your topic "Reducing Synchronization Overhead Through Bundled<br>Communication" may get also help if we would be able to<br>combine several window handles to one superset window handle.<br><br>If you have several windows for different buffers, but<br>only one synchronization pattern, e.g. MPI_Win_fince<br>then currently you must call MPI_Win_fence seperately<br>for each window handle.<br><br>I would propose:<br><br>MPI_Win_combine (/*IN*/ int count,<br> /*IN*/ MPI_Win *win,<br> /*IN*/ MPI_Comm comm,<br> /*OUT*/ MPI_Win *win_combined)<br><br>The process group of comm must contain the process groups of all<br>win.<br>The resulting window handle win_combined can be used only<br>in RMA synchronization calls and other helper routines,<br>but not for dynamic window allocation nor for any<br>RMA communication routine.<br>Collective synchronization routines must be called by all<br>processes<br>of comm.<br>The semantics of an RMA synchronization call using win_combined<br>is defined as if the calls were seperately issued for<br>each window handle of the array win. If group handles<br>are part of the argument list of the synchronization call<br>then the appropriate subset is used for each window handle in win.<br><br>What do you think about this idea for MPI-4.0?<br><br>Best regards<br>Rolf<br><br>----- Original Message -----<br><blockquote type="cite">From: "Jim Dinan" < <a href="mailto:james.dinan@gmail.com">james.dinan@gmail.com</a> ><br>To: "MPI WG Remote Memory Access working group" <<br><a href="mailto:mpiwg-rma@lists.mpi-forum.org">mpiwg-rma@lists.mpi-forum.org</a> ><br>Sent: Thursday, August 7, 2014 4:08:32 PM<br>Subject: [mpiwg-rma] RMA Notification<br><br><br><br>Hi All,<br><br><br>I have added a new proposal for an RMA notification extension:<br><a href="https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/439">https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/439</a><br><br><br>I would like to bring this forward for the RMA WG to consider as<br>an<br>MPI-4 extension.<br><br><br>Cheers,<br>~Jim.<br>_______________________________________________<br>mpiwg-rma mailing list<br>mpiwg-rma@lists.mpi-forum.org<br>http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma<br></blockquote><br>--<br>Dr. Rolf Rabenseifner . . . . . . . . . .. email<br><a href="mailto:rabenseifner@hlrs.de">rabenseifner@hlrs.de</a><br>High Performance Computing Center (HLRS) . phone<br>++49(0)711/685-65530<br>University of Stuttgart . . . . . . . . .. fax ++49(0)711 /<br>685-65832<br>Head of Dpmt Parallel Computing . . .<br>www.hlrs.de/people/rabenseifner<br>Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room<br>1.307)<br>_______________________________________________<br>mpiwg-rma mailing list<br>mpiwg-rma@lists.mpi-forum.org<br>http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma<br><br>_______________________________________________<br>mpiwg-rma mailing list<br>mpiwg-rma@lists.mpi-forum.org<br>http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma<br></blockquote><br>--<br>Dr. Rolf Rabenseifner . . . . . . . . . .. email<br><a href="mailto:rabenseifner@hlrs.de">rabenseifner@hlrs.de</a><br>High Performance Computing Center (HLRS) . phone<br>++49(0)711/685-65530<br>University of Stuttgart . . . . . . . . .. fax ++49(0)711 /<br>685-65832<br>Head of Dpmt Parallel Computing . . .<br>www.hlrs.de/people/rabenseifner<br>Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room<br>1.307)<br>_______________________________________________<br>mpiwg-rma mailing list<br>mpiwg-rma@lists.mpi-forum.org<br>http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma<br></blockquote>_______________________________________________<br>mpiwg-rma mailing list<br><a href="mailto:mpiwg-rma@lists.mpi-forum.org">mpiwg-rma@lists.mpi-forum.org</a><br>http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma<br><br></blockquote><br>-- <br>Dr. Rolf Rabenseifner . . . . . . . . . .. email <a href="mailto:rabenseifner@hlrs.de">rabenseifner@hlrs.de</a><br>High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530<br>University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832<br>Head of Dpmt Parallel Computing . . . <a href="http://www.hlrs.de/people/rabenseifner">www.hlrs.de/people/rabenseifner</a><br>Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307)<br>_______________________________________________<br>mpiwg-rma mailing list<br><a href="mailto:mpiwg-rma@lists.mpi-forum.org">mpiwg-rma@lists.mpi-forum.org</a><br>http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma<br></blockquote></div><br></div></body></html>