<!-- BaNnErBlUrFlE-BoDy-start -->
<!-- Preheader Text : BEGIN -->
<div style="display:none !important;display:none;visibility:hidden;mso-hide:all;font-size:1px;color:#ffffff;line-height:1px;height:0px;max-height:0px;opacity:0;overflow:hidden;">
Hello all, The changes we've been suggesting for MPI_Comm_agree aren't quite as straightforward as we've been thinking. Change 1 is to output a num_agreed_failed such that the first num_agreed_failed of the groups returned by a subsequent call</div>
<!-- Preheader Text : END -->
<!-- Email Banner : BEGIN -->
<div style="display:none !important;display:none;visibility:hidden;mso-hide:all;font-size:1px;color:#ffffff;line-height:1px;max-height:0px;opacity:0;overflow:hidden;">ZjQcmQRYFpfptBannerStart</div>
<!--[if ((ie)|(mso))]>
<table border="0" cellspacing="0" cellpadding="0" width="100%" style="padding: 16px 0px 16px 0px; direction: ltr" ><tr><td>
<table border="0" cellspacing="0" cellpadding="0" style="padding: 0px 10px 5px 6px; width: 100%; border-radius:4px; border-top:4px solid #90a4ae;background-color:#D0D8DC;"><tr><td valign="top">
<table align="left" border="0" cellspacing="0" cellpadding="0" style="padding: 4px 8px 4px 8px">
<tr><td style="color:#000000; font-family: 'Arial', sans-serif; font-weight:bold; font-size:14px; direction: ltr">
This Message Is From an External Sender
</td></tr>
<tr><td style="color:#000000; font-weight:normal; font-family: 'Arial', sans-serif; font-size:12px; direction: ltr">
This message came from outside your organization.
</td></tr>
</table>
</td></tr></table>
</td></tr></table>
<![endif]-->
<![if !((ie)|(mso))]>
<div dir="ltr" id="pfptBannerd8idolm" style="all: revert !important; display:block !important; text-align: left !important; margin:16px 0px 16px 0px !important; padding:8px 16px 8px 16px !important; border-radius: 4px !important; min-width: 200px !important; background-color: #D0D8DC !important; background-color: #D0D8DC; border-top: 4px solid #90a4ae !important; border-top: 4px solid #90a4ae;">
<div id="pfptBannerd8idolm" style="all: unset !important; float:left !important; display:block !important; margin: 0px 0px 1px 0px !important; max-width: 600px !important;">
<div id="pfptBannerd8idolm" style="all: unset !important; display:block !important; visibility: visible !important; background-color: #D0D8DC !important; color:#000000 !important; color:#000000; font-family: 'Arial', sans-serif !important; font-family: 'Arial', sans-serif; font-weight:bold !important; font-weight:bold; font-size:14px !important; line-height:18px !important; line-height:18px">
This Message Is From an External Sender
</div>
<div id="pfptBannerd8idolm" style="all: unset !important; display:block !important; visibility: visible !important; background-color: #D0D8DC !important; color:#000000 !important; color:#000000; font-weight:normal; font-family: 'Arial', sans-serif !important; font-family: 'Arial', sans-serif; font-size:12px !important; line-height:18px !important; line-height:18px; margin-top:2px !important;">
This message came from outside your organization.
</div>
</div>
<div style="clear: both !important; display: block !important; visibility: hidden !important; line-height: 0 !important; font-size: 0.01px !important; height: 0px"> </div>
</div>
<![endif]>
<div style="display:none !important;display:none;visibility:hidden;mso-hide:all;font-size:1px;color:#ffffff;line-height:1px;max-height:0px;opacity:0;overflow:hidden;">ZjQcmQRYFpfptBannerEnd</div>
<!-- Email Banner : END -->
<!-- BaNnErBlUrFlE-BoDy-end -->
<html>
<head><!-- BaNnErBlUrFlE-HeAdEr-start -->
<style>
#pfptBannerd8idolm { all: revert !important; display: block !important;
visibility: visible !important; opacity: 1 !important;
background-color: #D0D8DC !important;
max-width: none !important; max-height: none !important }
.pfptPrimaryButtond8idolm:hover, .pfptPrimaryButtond8idolm:focus {
background-color: #b4c1c7 !important; }
.pfptPrimaryButtond8idolm:active {
background-color: #90a4ae !important; }
html:root, html:root>body { all: revert !important; display: block !important;
visibility: visible !important; opacity: 1 !important; }
</style>
<!-- BaNnErBlUrFlE-HeAdEr-end -->
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Hello all,</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
The changes we've been suggesting for MPI_Comm_agree aren't quite as straightforward as we've been thinking.</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Change 1 is to output a num_agreed_failed such that the first num_agreed_failed of the groups returned by a subsequent call to MPI_Comm_get_failed will be similar across all ranks. However, we cannot guarantee the similarity of the fail groups unless new_failed
is false. Consider:</div>
<blockquote style="margin-left: 0.8ex; padding-left: 1ex; border-left: 3px solid rgb(200, 200, 200);">
<div class="elementToProof" style="margin-left: 40px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
MPI_Comm_iagree(..., &num_agreed_failed, &new_failed, &request);</div>
<div class="elementToProof" style="margin-left: 40px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
MPI_Test(&request, ...); // This rank passes known failures upwards in agreement's reduce step, but bcast step is not complete</div>
<div class="elementToProof" style="margin-left: 40px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
MPI_Recv(...); // this operation fails, local rank now knows of a new failed rank X</div>
<div class="elementToProof" style="margin-left: 40px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
MPI_Wait(&request, ...); // agree completes with new_failed. Rank X is not counted in num_agreed_failed if it contributed to the agreement before failing, but it may be placed before any newly detected failures in the local failed group.</div>
</blockquote>
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Change 2 is to allow num_known_failed to be renamed max_num_expected_failed and be potentially larger than the number of locally known failures. All ranks which pass a value larger than the local number of known failed ranks, however, run into the above problem
even if new_failed is false.</div>
<div style="margin-top: 0px; margin-bottom: 0px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div class="elementToProof" style="margin-top: 0px; margin-bottom: 0px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
So we could only say that the fail groups up to num_agreed_failed are guaranteed similar if new_failed is false and the passed max_num_expected_failed was <= the size of the local fail group on all ranks. That would be pretty awkward in the standard, so the
more realistic solutions are to</div>
<ol start="1" data-editing-info="{"applyListStyleFromLevel":false,"orderedStyleType":9}" style="margin-top: 0px; margin-bottom: 0px; list-style-type: upper-alpha;">
<li style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); margin-top: 0px; margin-bottom: 0px;">
<div class="elementToProof" role="presentation" style="margin-top: 0px; margin-bottom: 0px;">
internally repeat the agreement until !new_failed (performance implications, does not resolve the problem with change 2),</div>
</li><li style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); margin-top: 0px; margin-bottom: 0px;">
<div class="elementToProof" role="presentation" style="margin-top: 0px; margin-bottom: 0px;">
return a consistent agreed_failed group instead of num_agreed_failed (extra group allocation, potentially extra data in agreement bcast step, but both could be avoided with an MPI_GROUP_IGNORE), or</div>
</li><li style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); margin-top: 0px; margin-bottom: 0px;">
<div role="presentation" style="margin-top: 0px; margin-bottom: 0px;">revert to the original function syntax.</div>
</li></ol>
<div style="margin-top: 0px; margin-bottom: 0px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div class="elementToProof" style="margin-top: 0px; margin-bottom: 0px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
I prefer B, which lets you safely optimistically proceed with a consistent returned group of failed ranks instead of calling MPI_Comm_agree in a loop until !new_failed. Implementations can pass a flag indicating if the local rank called with MPI_GROUP_IGNORE
during the reduce step to avoid passing the failed ranks in the bcast step (if they even have a way to avoid that in current implementations).</div>
<div class="elementToProof" style="margin-top: 0px; margin-bottom: 0px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div class="elementToProof" style="margin-top: 0px; margin-bottom: 0px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Please consider and let me know your thoughts here or at the next meeting.</div>
<div class="elementToProof" style="margin-top: 0px; margin-bottom: 0px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div class="elementToProof" style="margin-top: 0px; margin-bottom: 0px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Thanks,</div>
<div class="elementToProof" style="margin-top: 0px; margin-bottom: 0px; font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Matthew Whitlock</div>
</body>
</html>