<div><div dir="auto">In the scenario described by Jim, where a send must be reposted due to transient network failures, where the retransmission is handled in software, and where MPI provides no progress outside MPI calls, it seems plausible that an unexpected outcome will be reached (also assuming <span style="border-color:rgb(0,0,0);color:rgb(0,0,0)">no timeouts or other corrective measures are taken by the network or software stack).</span></div><div dir="auto">- Rajeev’s example will eventually complete, because when the process impacted by the network issue will reach MPI_Finalize, the pending internal communications will be reissued and all processes will complete the MPI_Barrier.</div><div dir="auto">- In Jim’s original example it looks more likely that a deadlock will occur as there is no ensuing MPI call to reissue the message to be retransmitted, and a deadlock will occur.</div><div dir="auto"><br></div><div dir="auto">I don’t think we need transient network errors for such outcomes, it is enough to use a buffered send without followup MPI calls to reach the same delayed execution scenario.<br></div><div dir="auto"><br></div><div dir="auto">George.</div><div dir="auto"><br></div><div dir="auto">On Sun, Oct 11, 2020 at 14:28 Skjellum, Anthony via mpi-forum <<a href="mailto:mpi-forum@lists.mpi-forum.org" target="_blank">mpi-forum@lists.mpi-forum.org</a>> wrote:<br></div></div><div><div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">
<div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Rajeev, No, I don't think so. Did you all disagree with my reasoning?</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Tony</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div id="m_-3617589786068768389m_-5342928172570998151Signature">
<div>
<div id="m_-3617589786068768389m_-5342928172570998151divtagdefaultwrapper" dir="ltr" style="font-size:12pt;font-family:Calibri,Helvetica,sans-serif;color:rgb(0,0,0)">
<p style="margin-top:0px;margin-bottom:0px;font-family:Calibri,Helvetica,sans-serif">Anthony Skjellum, PhD</p>
<p style="margin-top:0px;margin-bottom:0px;font-family:Calibri,Helvetica,sans-serif">Professor of Computer Science and Chair of Excellence</p>
<p style="margin-top:0px;margin-bottom:0px;font-family:Calibri,Helvetica,sans-serif">Director, SimCenter</p>
<p style="margin-top:0px;margin-bottom:0px;font-family:Calibri,Helvetica,sans-serif">University of Tennessee at Chattanooga (UTC)</p>
<p style="margin-top:0px;margin-bottom:0px;font-family:Calibri,Helvetica,sans-serif"><a href="mailto:tony-skjellum@utc.edu" style="font-family:Calibri,Helvetica,sans-serif" target="_blank">tony-skjellum@utc.edu</a> [or <a href="mailto:skjellum@gmail.com" style="font-family:Calibri,Helvetica,sans-serif" target="_blank">skjellum@gmail.com</a>]</p>
<p style="margin-top:0px;margin-bottom:0px;font-family:Calibri,Helvetica,sans-serif">cell: 205-807-4968</p>
<p style="margin-top:0px;margin-bottom:0px;font-family:Calibri,Helvetica,sans-serif"><br>
</p>
</div>
</div>
</div>
</div>
<div id="m_-3617589786068768389m_-5342928172570998151appendonsend"></div>
<hr style="display:inline-block;width:98%">
<div id="m_-3617589786068768389m_-5342928172570998151divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(0,0,0)"><b style="font-family:Calibri,sans-serif">From:</b> mpi-forum <<a href="mailto:mpi-forum-bounces@lists.mpi-forum.org" style="font-family:Calibri,sans-serif" target="_blank">mpi-forum-bounces@lists.mpi-forum.org</a>> on behalf of Thakur, Rajeev via mpi-forum <<a href="mailto:mpi-forum@lists.mpi-forum.org" style="font-family:Calibri,sans-serif" target="_blank">mpi-forum@lists.mpi-forum.org</a>><br>
<b style="font-family:Calibri,sans-serif">Sent:</b> Sunday, October 11, 2020 2:23 PM<br>
<b style="font-family:Calibri,sans-serif">To:</b> Jim Dinan <<a href="mailto:james.dinan@gmail.com" style="font-family:Calibri,sans-serif" target="_blank">james.dinan@gmail.com</a>><br>
<b style="font-family:Calibri,sans-serif">Cc:</b> Thakur, Rajeev <<a href="mailto:thakur@anl.gov" style="font-family:Calibri,sans-serif" target="_blank">thakur@anl.gov</a>>; Main MPI Forum mailing list <<a href="mailto:mpi-forum@lists.mpi-forum.org" style="font-family:Calibri,sans-serif" target="_blank">mpi-forum@lists.mpi-forum.org</a>></font></div></div><div dir="ltr"><div id="m_-3617589786068768389m_-5342928172570998151divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(0,0,0)"><br>
<b style="font-family:Calibri,sans-serif">Subject:</b> Re: [Mpi-forum] Progress Question</font>
<div> </div>
</div>
<div lang="EN-US" link="blue" vlink="purple" style="word-wrap:break-word">
<div>
<p><span style="font-family:"Lucida Grande",sans-serif">Does it mean that in the following program, although all processes have called barrier, some process may not exit the barrier for 100 days?</span></p>
<p><span style="font-family:"Lucida Grande",sans-serif"> </span></p>
<p><span style="font-family:"Lucida Grande",sans-serif">MPI_Init</span></p>
<p><span style="font-family:"Lucida Grande",sans-serif">MPI_Barrier</span></p>
<p><span style="font-family:"Lucida Grande",sans-serif">sleep(100 days)</span></p>
<p><span style="font-family:"Lucida Grande",sans-serif">MPI_Finalize</span></p>
<p><span style="font-family:"Lucida Grande",sans-serif"> </span></p>
<p><span style="font-family:"Lucida Grande",sans-serif">Rajeev</span></p>
<p><span style="font-family:"Lucida Grande",sans-serif"> </span></p>
<p><span style="font-family:"Lucida Grande",sans-serif"> </span></p>
<div style="border-style:solid none none;border-top-width:1pt;padding:3pt 0in 0in;border-top-color:rgb(181,196,223)">
<p><b><span style="font-size:12pt;color:black">From: </span>
</b><span style="font-size:12pt;color:black">Jim Dinan <<a href="mailto:james.dinan@gmail.com" target="_blank">james.dinan@gmail.com</a>><br>
<b>Date: </b>Sunday, October 11, 2020 at 10:31 AM<br>
<b>To: </b>"Thakur, Rajeev" <<a href="mailto:thakur@anl.gov" target="_blank">thakur@anl.gov</a>><br>
<b>Cc: </b>Main MPI Forum mailing list <<a href="mailto:mpi-forum@lists.mpi-forum.org" target="_blank">mpi-forum@lists.mpi-forum.org</a>><br>
<b>Subject: </b>Re: [Mpi-forum] Progress Question</span></p>
</div>
<div>
<p> </p>
</div>
<div>
<p>Hi Rajeev,</p>
<div>
<p> </p>
</div>
<div>
<p>Yes, that's the question and my initial answer was the same as yours. However, we then started talking about the implementation of the barrier, which led to the second example. For example, consider a situation where there is an error
in transmission and the implementation needs to enter the progress engine to retry a send operation in software.</p>
</div>
<div>
<p> </p>
</div>
<div>
<p> ~Jim.</p>
</div>
</div>
<p> </p>
<div>
<div>
<p>On Sat, Oct 10, 2020 at 5:10 PM Thakur, Rajeev <<a href="mailto:thakur@anl.gov" target="_blank">thakur@anl.gov</a>> wrote:</p>
</div>
<blockquote style="border-style:none none none solid;border-left-width:1pt;padding:0in 0in 0in 6pt;margin-left:4.8pt;margin-right:0in;border-left-color:rgb(204,204,204)">
<div>
<div>
<p><span style="font-family:"Lucida Grande",sans-serif">Jim,</span></p>
<p><span style="font-family:"Lucida Grande",sans-serif"> I don’t fully understand your question. Is it “If all processes reach MPI_Barrier, are they guaranteed to exit the barrier without the need for any other MPI function
to be called on any process?” I would say yes.</span></p>
<p><span style="font-family:"Lucida Grande",sans-serif"> </span></p>
<p><span style="font-family:"Lucida Grande",sans-serif">Rajeev</span></p>
<p><span style="font-family:"Lucida Grande",sans-serif"> </span></p>
<p><span style="font-family:"Lucida Grande",sans-serif"> </span></p>
<div style="border-style:solid none none;border-top-width:1pt;padding:3pt 0in 0in;border-top-color:rgb(181,196,223)">
<p><b><span style="font-size:12pt;color:black">From:
</span></b><span style="font-size:12pt;color:black">mpi-forum <<a href="mailto:mpi-forum-bounces@lists.mpi-forum.org" target="_blank">mpi-forum-bounces@lists.mpi-forum.org</a>> on behalf of Jim Dinan via mpi-forum <<a href="mailto:mpi-forum@lists.mpi-forum.org" target="_blank">mpi-forum@lists.mpi-forum.org</a>><br>
<b>Reply-To: </b>Main MPI Forum mailing list <<a href="mailto:mpi-forum@lists.mpi-forum.org" target="_blank">mpi-forum@lists.mpi-forum.org</a>><br>
<b>Date: </b>Saturday, October 10, 2020 at 12:31 PM<br>
<b>To: </b>Main MPI Forum mailing list <<a href="mailto:mpi-forum@lists.mpi-forum.org" target="_blank">mpi-forum@lists.mpi-forum.org</a>><br>
<b>Cc: </b>Jim Dinan <<a href="mailto:james.dinan@gmail.com" target="_blank">james.dinan@gmail.com</a>><br>
<b>Subject: </b>[Mpi-forum] Progress Question</span></p>
</div>
<div>
<p> </p>
</div>
<div>
<p>Hi All,</p>
<div>
<p> </p>
</div>
<div>
<p>A colleague recently asked a question that I wasn't able to answer definitively. Is the following code guaranteed to make progress?</p>
</div>
<div>
<p> </p>
</div>
<blockquote style="margin:5pt 0in 5pt 30pt">
<div>
<p>MPI_Barrier();</p>
</div>
<div>
<p>if rank == 1</p>
</div>
<div>
<p> create_file("test")</p>
</div>
<div>
<p>if rank == 0</p>
</div>
<div>
<p> while not_exists("test")</p>
</div>
<div>
<p> sleep(1);</p>
</div>
</blockquote>
<div>
<p> </p>
</div>
<div>
<p>That is, can rank 1 require rank 0 to make MPI calls after its return from the barrier, in order for rank 1 to complete the barrier? If the code were written as follows:</p>
</div>
<div>
<p> </p>
</div>
<blockquote style="margin:5pt 0in 5pt 30pt">
<div>
<p>isend(..., other_rank, &req[0])</p>
</div>
<div>
<p>irecv(..., other_rank, &req[1])</p>
</div>
<div>
<p>waitall(2, req)</p>
</div>
<div>
<p>if rank == 1</p>
</div>
<div>
<p> create_file("test")</p>
</div>
<div>
<p>if rank == 0</p>
</div>
<div>
<p> while not_exists("test")</p>
</div>
<div>
<p> sleep(1);</p>
</div>
</blockquote>
<p> </p>
<div>
<p>I think it would clearly not guarantee progress since the send data can be buffered. Is the same true for barrier?</p>
</div>
<div>
<p> </p>
</div>
<div>
<p>Cheers,</p>
</div>
<div>
<p> ~Jim.</p>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
_______________________________________________<br>
mpi-forum mailing list<br>
<a href="mailto:mpi-forum@lists.mpi-forum.org" target="_blank">mpi-forum@lists.mpi-forum.org</a><br>
<a href="https://lists.mpi-forum.org/mailman/listinfo/mpi-forum" rel="noreferrer" target="_blank">https://lists.mpi-forum.org/mailman/listinfo/mpi-forum</a><br>
</blockquote></div></div>
</div>