<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
<style type="text/css" style="display:none"><!-- p { margin-top: 0px; margin-bottom: 0px; }--></style>
</head>
<body dir="ltr" style="font-size:12pt;color:#000000;background-color:#FFFFFF;font-family:Calibri,Arial,Helvetica,sans-serif;">
<p>Thanks Bill and Pavan. </p>
<p><br>
</p>
<p>I was having trouble seeing how (especially) a nonblocking ready send could be guaranteed to occur before the matching receive but since I only saw the problem on Titan I was wondering if I had missed something.<br>
</p>
<p><br>
</p>
<p>Apparently, I was just really lucky on the other platforms (or unlucky on Titan)<br>
</p>
<p><br>
</p>
<p>Thanks!<br>
</p>
<p><br>
</p>
<p><br>
</p>
<p><br>
</p>
<p><br>
</p>
<div style="word-wrap:break-word">
<hr tabindex="-1" style="display:inline-block; width:98%">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> mpi-forum <mpi-forum-bounces@lists.mpi-forum.org> on behalf of William Gropp via mpi-forum <mpi-forum@lists.mpi-forum.org><br>
<b>Sent:</b> Wednesday, November 07, 2018 8:06 PM<br>
<b>To:</b> Main MPI Forum mailing list<br>
<b>Cc:</b> William Gropp<br>
<b>Subject:</b> Re: [Mpi-forum] Persistent Readysend Semantics Question</font>
<div> </div>
</div>
<div>Pavan is correct; the program is buggy. Here’s an example
<div class=""><br class="">
</div>
<div class="">process 1 process 2</div>
<div class="">start(recv) /* something causes a delay at process 2 */</div>
<div class="">start(rsend)</div>
<div class="">wait(all)</div>
<div class=""><br class="">
</div>
<div class=""> start(recv)</div>
<div class=""> ….</div>
<div class=""><br class="">
</div>
<div class="">In this case, the rsend on process 1 occurs before the recv is started on process 2, and the MPI program is incorrect. Without some synchronization, either explicit or implicit (e.g., an allreduce for time step control), the use of Rsend in any
form is unlikely to be correct.</div>
<div class=""><br class="">
</div>
<div class="">Bill</div>
<div class=""><br class="">
<div class=""><br class="webkit-block-placeholder">
</div>
<div class=""><br class="webkit-block-placeholder">
</div>
<div class="">
<div class="" style="color:rgb(0,0,0); letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; word-wrap:break-word">
<div class="" style="color:rgb(0,0,0); letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; word-wrap:break-word">
<div class="" style="color:rgb(0,0,0); letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; word-wrap:break-word">
<div class="" style="word-wrap:break-word">
<div style="color:rgb(0,0,0); font-family:Helvetica; font-size:12px; font-style:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px">
William Gropp<br class="">
Director and Chief Scientist, NCSA<br class="">
Thomas M. Siebel Chair in Computer Science<br class="">
University of Illinois Urbana-Champaign</div>
<br class="Apple-interchange-newline">
</div>
</div>
<br class="Apple-interchange-newline">
</div>
<br class="Apple-interchange-newline">
</div>
<br class="Apple-interchange-newline">
<br class="Apple-interchange-newline">
</div>
<br class="">
<div>
<blockquote type="cite" class="">
<div class="">On Nov 7, 2018, at 12:14 PM, Balaji, Pavan via mpi-forum <<a href="mailto:mpi-forum@lists.mpi-forum.org" class="">mpi-forum@lists.mpi-forum.org</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class=""><span class="" style="font-family:Helvetica; font-size:12px; font-style:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; float:none; display:inline!important">Brian,</span>
<div class="" style="font-family:Helvetica; font-size:12px; font-style:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px">
<br class="">
</div>
<div class="" style="font-family:Helvetica; font-size:12px; font-style:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px">
Assuming all processes are doing the same code as below, I think the user program is incorrect and you were just getting lucky with the other implementations.</div>
<div class="" style="font-family:Helvetica; font-size:12px; font-style:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px">
<br class="">
</div>
<div class="" style="font-family:Helvetica; font-size:12px; font-style:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px">
Specifically, there’s nothing stopping the rsend from a process to reach the other process before it posted the corresponding recv. For example, it might still be in the second wait all from the previous iteration.</div>
<div class="" style="font-family:Helvetica; font-size:12px; font-style:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px">
<br class="">
</div>
<div class="" style="font-family:Helvetica; font-size:12px; font-style:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px">
— Pavan<br class="">
<br class="">
<div dir="ltr" class="">Sent from my iPhone</div>
<div dir="ltr" class=""><br class="">
On Nov 7, 2018, at 12:09 PM, Smith, Brian E. via mpi-forum <<a href="mailto:mpi-forum@lists.mpi-forum.org" class="" style="color:rgb(149,79,114); text-decoration:underline">mpi-forum@lists.mpi-forum.org</a>> wrote:<br class="">
<br class="">
</div>
<blockquote type="cite" class="">
<div dir="ltr" class="">
<div class="WordSection1" style="">
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt">Hi all,</span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt"> </span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="">(trying again; I thought this address was subscribed to the list but maybe not. Sorry if this is a duplicate)</span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style=""> </span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt">I have a user-provided code that uses persistent ready sends. (Don’t ask. I don’t have an answer to “why?”. Maybe it actually helped on some machine sometime in the past?)</span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt"> </span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt">Anyway, the app fails on Titan fairly consistently (95+% failure) but works on most other platforms (BGQ, Summit, generic OMPI cluster, generic Intel MPI cluster).</span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt">Note – I haven’t tried as many times on the other platforms as on Titan so maybe it might fail on one of them occasionally. I saw zero failures in my testing however.</span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt"> </span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt">The code is basically this:</span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt">MPI_Recv_init()</span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt">MPI_Rsend_init()</span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt"> </span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt">While(condition)</span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt">{</span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt"> MPI_Start(recv_request)</span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt"> MPI_Start(rsend_request)</span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt"> MPI_Waitall(both requests)</span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt"> Twiddle_sendbuf_slightly();</span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt">}</span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt"> </span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt">MPI_Request_free(recv_request)</span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt">MPI_Request_free(rsend_request)</span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt"> </span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt">MPI_Cart_shift(rotate source/dest different direction now)</span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt">MPI_Recv_init() // sending the other direction now, basically</span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt">MPI_Rsend_init()</span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt">While(condition)</span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt">{</span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt"> MPI_Start(recv_request)</span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt"> MPI_Start(rsend_request)</span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt"> MPI_Waitall(both requests)</span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt"> Twiddle_sendbuf_slightly();</span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt">}</span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt"> </span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt">MPI_Request_free(recv_request)</span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt">MPI_Request_free(rsend_request)</span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt"> </span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt">Is this considered a “correct program”? There’s only a couple paragraphs on persistent sends in 800+ pages of standard, and not much more for nonblocking ready sends (which is essentially what this becomes). It’s pretty
vague territory.<span class="Apple-converted-space"> </span></span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt"> </span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt">I tried splitting the Waitall() into 2 Wait()s, explicitly waiting on the Recv request first, then the Rsend request. However, this still fails and suggests the requests are not happening in order:</span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif; background-color:white">
<span class="" style="font-size:8.5pt; font-family:Menlo">Rank 2 [Wed Nov 7 08:26:12 2018] [c5-0c0s3n1] Fatal error in PMPI_Wait: Other MPI error, error stack:</span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif; background-color:white">
<span class="" style="font-size:8.5pt; font-family:Menlo">PMPI_Wait(207).....................: MPI_Wait(request=0x7fffffff5698, status=0x7fffffff5630) failed</span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif; background-color:white">
<span class="" style="font-size:8.5pt; font-family:Menlo">MPIR_Wait_impl(100)................: </span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif; background-color:white">
<span class="" style="font-size:8.5pt; font-family:Menlo">MPIDI_CH3_PktHandler_ReadySend(829): Ready send from source 1 and with tag 1 had no matching receive</span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif; background-color:white">
<span class="" style="font-size:8.5pt; font-family:Menlo"> </span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt">It strongly looks like the send is not always posted before the receive, or at least the waitall completes the send sometimes before the recv. I suspect that means an implementation bug. Cray might actually be doing something
for optimizing either persistent communications or ready sends (or both) that we never did on BGQ (so it’s not necessarily an MPICH vs OMPI difference at least) </span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt"> </span><span class="" style="font-size:8.5pt; font-family:Menlo"></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt">Thoughts?</span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt"> </span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt">I’ll open a bug with them at some point but wanted to verify semantics first.</span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt"> </span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt">Thanks</span><span class="" style=""></span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt"> </span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="">Brian Smith</span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="">Oak Ridge Leadership Computing Facility</span></div>
<div class="" style="margin:0in 0in 0.0001pt; font-size:12pt; font-family:Calibri,sans-serif">
<span class="" style="font-size:11pt"> </span></div>
</div>
</div>
</blockquote>
<blockquote type="cite" class="">
<div dir="ltr" class=""><span class="">_______________________________________________</span><br class="">
<span class="">mpi-forum mailing list</span><br class="">
<span class=""><a href="mailto:mpi-forum@lists.mpi-forum.org" class="" style="color:rgb(149,79,114); text-decoration:underline">mpi-forum@lists.mpi-forum.org</a></span><br class="">
<span class=""><a href="https://lists.mpi-forum.org/mailman/listinfo/mpi-forum" class="" style="color:rgb(149,79,114); text-decoration:underline">https://lists.mpi-forum.org/mailman/listinfo/mpi-forum</a></span><br class="">
</div>
</blockquote>
</div>
<span class="" style="font-family:Helvetica; font-size:12px; font-style:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; float:none; display:inline!important">_______________________________________________</span><br class="" style="font-family:Helvetica; font-size:12px; font-style:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px">
<span class="" style="font-family:Helvetica; font-size:12px; font-style:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px; float:none; display:inline!important">mpi-forum
mailing list</span><br class="" style="font-family:Helvetica; font-size:12px; font-style:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px">
<a href="mailto:mpi-forum@lists.mpi-forum.org" class="" style="color:rgb(149,79,114); text-decoration:underline; font-family:Helvetica; font-size:12px; font-style:normal; font-weight:normal; letter-spacing:normal; orphans:auto; text-align:start; text-indent:0px; text-transform:none; white-space:normal; widows:auto; word-spacing:0px">mpi-forum@lists.mpi-forum.org</a><br class="" style="font-family:Helvetica; font-size:12px; font-style:normal; font-weight:normal; letter-spacing:normal; text-align:start; text-indent:0px; text-transform:none; white-space:normal; word-spacing:0px">
<a href="https://lists.mpi-forum.org/mailman/listinfo/mpi-forum" class="" style="color:rgb(149,79,114); text-decoration:underline; font-family:Helvetica; font-size:12px; font-style:normal; font-weight:normal; letter-spacing:normal; orphans:auto; text-align:start; text-indent:0px; text-transform:none; white-space:normal; widows:auto; word-spacing:0px">https://lists.mpi-forum.org/mailman/listinfo/mpi-forum</a></div>
</blockquote>
</div>
<br class="">
</div>
</div>
</div>
</body>
</html>