<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
Hi Ralph,
<div class=""><br class="">
</div>
<div class="">Thanks for the updates, I’ll take a look in a moment (after coffee).</div>
<div class=""><br class="">
</div>
<div class="">Technically, a cancel API in PMIX *could* be used with a blocking group construction by calling the cancel inside an event callback. For example, an app could keep a count of how many times PMIX_GROUP_NOTIFY_TERMINATION was called for this group
construction and invite a new process for the first N times but cancel the operation in response to any subsequent event(s).</div>
<div class="">
<div class=""><br class="webkit-block-placeholder">
</div>
<div class="">
<div style="color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
<div class="">MPI is trying to get rid of its cancel API - we deprecated cancel for point-to-point send because it is fundamentally broken. However, cancel for point-to-point receive is still valid and useful. The problem with cancel is always the race between
the “all is OK, go ahead” and the “whoa, stop that” signals. With a receive in MPI, the choice of which will succeed (receive or cancel) is always a local decision and can therefore be made atomic and consistent. The choice between send and cancel is a distributed
decision, which cannot be atomic, and always suffers from a race. One way to avoid this race in a PMIX group cancel would be specify that it is only valid from within an event callback, e.g. by exposing it as an in-out/by-ref parameter in the callback itself
(not as a separate function call). PMIX could then examine this parameter (set by the user, during the event callback) when the callback returns. It’s a binary/boolean choice between “I handled it, carry on” and “I panicked, burn the world”.</div>
<div class=""><br class="">
</div>
<div class="">Is it useful? Well all systems have finite resources to use as replacements, so eventually this operation must fail. This type of cancel allows the application to choose when to give up based on how many things happened rather than just how many
seconds elapsed.</div>
<div class=""><br class="">
</div>
Cheers,</div>
<div style="color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
Dan.</div>
<div style="color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
—<br class="">
Dr Daniel Holmes PhD<br class="">
Applications Consultant in HPC Research<br class="">
<a href="mailto:d.holmes@epcc.ed.ac.uk" class="">d.holmes@epcc.ed.ac.uk</a><br class="">
Phone: +44 (0) 131 651 3465<br class="">
Mobile: +44 (0) 7940 524 088<br class="">
Address: Room 3415, JCMB, The King’s Buildings, Edinburgh, EH9 3FD</div>
<div style="color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
—</div>
<div style="color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.</div>
<div style="color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
—</div>
</div>
<br class="">
<div>
<blockquote type="cite" class="">
<div class="">On 9 Aug 2018, at 03:54, Ralph H Castain <<a href="mailto:rhc@open-mpi.org" class="">rhc@open-mpi.org</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
I have updated the web page to reflect the comments. Let me know what you think and about the “cancel” API.<br class="">
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">On Aug 8, 2018, at 5:40 PM, Ralph H Castain <<a href="mailto:rhc@open-mpi.org" class="">rhc@open-mpi.org</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
<br class="">
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">On Aug 8, 2018, at 4:14 PM, HOLMES Daniel <<a href="mailto:d.holmes@epcc.ed.ac.uk" class="">d.holmes@epcc.ed.ac.uk</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
Hi Ralph,
<div class=""><br class="">
</div>
<div class="">I was nodding whilst reading,</div>
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
Great!</div>
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
<div class="">until I got to PMIX_GROUP_LEAVE_REQUEST (Destruct procedure, PMIx_Groups).</div>
<div class=""><br class="">
</div>
<div class="">There are situations where a process will leave without the luxury of requesting and being patient first, e.g. faults (handled as termination, I know, bear with me a moment). If this event was instead PMIX_GROUP_LEFT, then processes would be written
to be able to cope with sudden exits in other processes. They have to be written like that anyway because of PMIX_GROUP_NOTIFY_TERMINATION. This event simply distinguishes "the process called PMIx_GROUP_LEAVE" from "the RM figured out a process stopped executing
(normally or abnormally)”. Is such a distinction useful?</div>
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
It’s a good point. My thought was to provide a “clean” way of dynamically leaving a group as opposed to just pulling out. On the other hand, we do need apps to be prepared for unexpected termination - so it isn’t clear that there is any real benefit. I have
no issue with making this change.</div>
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
<div class=""><br class="">
</div>
<div class="">
<div class="">In terms of outstanding/in-progress collective operations, just state that calling PMIX_GROUP_LEAVE is not allowed unless no such operations are in flight.</div>
</div>
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
I think putting a requirement that no collective op can be in progress is unenforceable, especially if you take the position that leaving is the same as unexpected termination - i.e., programs need to be written in a way that can adapt to terminations or departures.
We can provide an event indicating that departure occurred and user apps need to register for it and decide for themselves how to respond if in a user collective. The PMIx server can adjust any ongoing PMIx collective (e.g., PMIx_Fence) without user intervention.
We currently error-out from such operations, but we can provide an attribute to indicate the operation should “self-heal” and proceed to completion.</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
<div class="">
<div class="">The potential race between a process calling PMIX_GROUP_LEAVE and other process(es) in the group starting a collective operation should not happen in a well-defined program. Also, if PMIX_GROUP_NOTIFY_TERMINATION can state "collective operations
will be adjusted appropriately" then why can’t PMIX_GROUP_LEAVE say that too?</div>
</div>
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
No problem - it can certainly do so.</div>
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
<div class="">
<div class=""><br class="webkit-block-placeholder">
</div>
<div class="">
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
—</div>
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
<br class="">
</div>
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
For PMIX_GROUP_JOIN, can the leader process give up creation of the group and somehow tell PMIX to stop trying?</div>
</div>
</div>
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
Sure - it can do so by setting the PMIX_TIMEOUT attribute. We could provide a “cancel” API as well, but that would require that you used the non-blocking form of PMIx_Group_construct as otherwise there would be no way to call it. Would a “cancel” API be of
benefit?</div>
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
<div class="">
<div class="">
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
If so, then processes that accepted a join request should be informed that the group is never going to be constructed, i.e. they should stop waiting for the callback/return of the blocking function. Thus, "once the group has been completely constructed” could
be tempered with “or the group construction fails”.</div>
</div>
</div>
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
Agreed - will update.</div>
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
<div class="">
<div class="">
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
<br class="Apple-interchange-newline">
Cheers,</div>
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
Dan.</div>
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
—<br class="">
Dr Daniel Holmes PhD<br class="">
Applications Consultant in HPC Research<br class="">
<a href="mailto:d.holmes@epcc.ed.ac.uk" class="">d.holmes@epcc.ed.ac.uk</a><br class="">
Phone: +44 (0) 131 651 3465<br class="">
Mobile: +44 (0) 7940 524 088<br class="">
Address: Room 3415, JCMB, The King’s Buildings, Edinburgh, EH9 3FD</div>
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
—</div>
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.</div>
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
—</div>
</div>
<br class="">
<div class="">
<blockquote type="cite" class="">
<div class="">On 8 Aug 2018, at 23:04, Ralph H Castain <<a href="mailto:rhc@open-mpi.org" class="">rhc@open-mpi.org</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
Hi folks
<div class=""><br class="">
</div>
<div class="">I have updated the PMIx Group web page to capture the discussion of the prior meeting plus some subsequent thoughts:</div>
<div class=""><br class="">
</div>
<div class=""><a href="https://pmix.org/pmix-standard/pmix-groups/" class="">https://pmix.org/pmix-standard/pmix-groups/</a></div>
<div class=""><br class="">
</div>
<div class="">I’ll try to put some initial implementation behind it before the meeting, so please feel free to chime up with any thoughts.</div>
<div class="">Ralph</div>
<div class=""><br class="">
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">On Aug 6, 2018, at 11:38 AM, Ralph H Castain <<a href="mailto:rhc@open-mpi.org" class="">rhc@open-mpi.org</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div dir="auto" class="">Looks like I can free some time up this week for groups - will try to update later this week
<div class=""><br class="">
<div class="">Sent from my iPhone</div>
<div class=""><br class="">
On Aug 6, 2018, at 11:05 AM, HOLMES Daniel <<a href="mailto:d.holmes@epcc.ed.ac.uk" class="">d.holmes@epcc.ed.ac.uk</a>> wrote:to<br class="">
<br class="">
</div>
<blockquote type="cite" class="">
<div class="">Hi all,
<div class=""><br class="">
</div>
<div class="">The next meeting for the Sessions WG will be *Tuesday 14th Aug 2018* at 12pm Eastern US time.</div>
<div class=""><br class="">
</div>
<div class="">Note the change of day and time. This is a one-off change due to vacation time.</div>
<div class=""><br class="">
</div>
<div class="">The connection details for the call will be sent out on this list nearer the time.</div>
<div class=""></div>
<div class="">
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
<br class="">
</div>
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
Cheers,</div>
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
Dan.</div>
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
—<br class="">
Dr Daniel Holmes PhD<br class="">
Applications Consultant in HPC Research<br class="">
<a href="mailto:d.holmes@epcc.ed.ac.uk" class="">d.holmes@epcc.ed.ac.uk</a><br class="">
Phone: +44 (0) 131 651 3465<br class="">
Mobile: +44 (0) 7940 524 088<br class="">
Address: Room 3415, JCMB, The King’s Buildings, Edinburgh, EH9 3FD</div>
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
—</div>
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.</div>
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
—</div>
</div>
<br class="">
</div>
</blockquote>
<blockquote type="cite" class="">
<div class=""><span class="">The University of Edinburgh is a charitable body, registered in</span><br class="">
<span class="">Scotland, with registration number SC005336.</span><br class="">
</div>
</blockquote>
<blockquote type="cite" class="">
<div class=""><span class="">_______________________________________________</span><br class="">
<span class="">mpiwg-sessions mailing list</span><br class="">
<span class=""><a href="mailto:mpiwg-sessions@lists.mpi-forum.org" class="">mpiwg-sessions@lists.mpi-forum.org</a></span><br class="">
<span class=""><a href="https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions" class="">https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions</a></span><br class="">
</div>
</blockquote>
</div>
</div>
</div>
</blockquote>
</div>
<br class="">
</div>
</div>
_______________________________________________<br class="">
mpiwg-sessions mailing list<br class="">
<a href="mailto:mpiwg-sessions@lists.mpi-forum.org" class="">mpiwg-sessions@lists.mpi-forum.org</a><br class="">
<a href="https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions" class="">https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions</a><br class="">
</div>
</blockquote>
</div>
<br class="">
</div>
</div>
The University of Edinburgh is a charitable body, registered in<br class="">
Scotland, with registration number SC005336.<br class="">
_______________________________________________<br class="">
mpiwg-sessions mailing list<br class="">
<a href="mailto:mpiwg-sessions@lists.mpi-forum.org" class="">mpiwg-sessions@lists.mpi-forum.org</a><br class="">
<a href="https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions" class="">https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions</a><br class="">
</div>
</blockquote>
</div>
<br class="">
</div>
_______________________________________________<br class="">
mpiwg-sessions mailing list<br class="">
<a href="mailto:mpiwg-sessions@lists.mpi-forum.org" class="">mpiwg-sessions@lists.mpi-forum.org</a><br class="">
<a href="https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions" class="">https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions</a><br class="">
</div>
</blockquote>
</div>
<br class="">
</div>
_______________________________________________<br class="">
mpiwg-sessions mailing list<br class="">
<a href="mailto:mpiwg-sessions@lists.mpi-forum.org" class="">mpiwg-sessions@lists.mpi-forum.org</a><br class="">
https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions<br class="">
</div>
</blockquote>
</div>
<br class="">
</div>
</body>
</html>