<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
Hi Ralph,
<div class=""><br class="">
</div>
<div class="">That’s sounds good to me.</div>
<div class=""><br class="">
</div>
<div class="">If a process does not register for a particular kind of event, does PMIx/RM incur overhead anyway? That is, does the event get delivered to the process but never raised (network traffic, local storage, processing time) or does it never get delivered
at all? I’m thinking that having two events PMIX_NOTIFICATION_FOR_LEADERS and PMIX_LEADERSHIP_ELECTION_NEEDED would allow any process to register as (one of) the leader(s) by registering for the first event. Any process that wants a say in future leadership
elections, e.g. might wish to become a leader, would register for the other event. I suspect that this would be slower and higher overhead than the attribute design option but I’ll put it out there in case it has some merit.<br class="">
<div class="">
<div style="color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
<br class="Apple-interchange-newline">
Cheers,</div>
<div style="color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
Dan.</div>
<div style="color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
—<br class="">
Dr Daniel Holmes PhD<br class="">
Applications Consultant in HPC Research<br class="">
<a href="mailto:d.holmes@epcc.ed.ac.uk" class="">d.holmes@epcc.ed.ac.uk</a><br class="">
Phone: +44 (0) 131 651 3465<br class="">
Mobile: +44 (0) 7940 524 088<br class="">
Address: Room 3415, JCMB, The King’s Buildings, Edinburgh, EH9 3FD</div>
<div style="color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
—</div>
<div style="color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.</div>
<div style="color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
—</div>
</div>
<br class="">
<div>
<blockquote type="cite" class="">
<div class="">On 16 Aug 2018, at 17:36, Ralph H Castain <<a href="mailto:rhc@open-mpi.org" class="">rhc@open-mpi.org</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
You make a good point. Let me try to capture my thinking.
<div class=""><br class="">
</div>
<div class="">I had imagined the PMIx_Group_construct operation as being used when every process “knows” the complete array of participants. In that case, one could imagine the “invite” option as being used to alert the other processes to “join now” - but that
really shouldn’t be necessary or valid as we are envisioning it as a collective operation. So I agree that we should remove the “invite” option.</div>
<div class=""><br class="">
</div>
<div class="">We should add an attribute for declaring someone to be the “leader” of the construct collective in the case where someone wants a specific process to deal with failures during that procedure. If they provide it, then failure events go only to
that one process - if the leader fails, then we alert all participants so they can optionally declare a new leader. In the absence of a declared leader, failure events go to all participants.</div>
<div class=""><br class="">
</div>
<div class="">Make sense?</div>
<div class="">Ralph</div>
<div class=""><br class="">
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">On Aug 16, 2018, at 2:27 AM, HOLMES Daniel <<a href="mailto:d.holmes@epcc.ed.ac.uk" class="">d.holmes@epcc.ed.ac.uk</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
Hi Ralph,
<div class=""><br class="">
</div>
<div class="">What is the difference between a group construct with the invite attribute and an explicit invitation? That is, between:</div>
<div class=""><br class="">
</div>
<div class="">PMIX_Group_construct(…, procs, …) // with PMIX_GROUP_INVITE_MEMBERS attribute</div>
<div class="">/* PMIx raises PMIX_GROUP_REQUESTED event at each process in provided array */<br class="">
<div class="">/* each requested/invited process responds by calling PMIX_Group_construct??? … or not, as it chooses */</div>
<div class="">
<div class="">/* PMIx raises PMIX_GROUP_INVITE_FAILED events for declined requests or terminated processes */</div>
</div>
<div class="">
<div class="">/* PMIx raises PMIX_GROUP_MEMBERSHIP_UPDATE events for processes that called PMIX_Group_construct??? */</div>
</div>
<div class=""><br class="">
</div>
<div class="">and:</div>
<div class=""><br class="">
</div>
<div class="">
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
<div class="">PMIX_Group_invite(…, procs, …)</div>
<div class="">/* PMIx raises PMIX_GROUP_INVITED event at each process in provided array */<br class="">
<div class="">/* each requested/invited process responds by calling PMIX_Group_join … or not, as it chooses */</div>
<div class="">
<div class="">/* PMIx raises PMIX_GROUP_INVITE_FAILED events for declined requests or terminated processes */</div>
</div>
</div>
<div class="">
<div class="">/* PMIx raises PMIX_GROUP_INVITE_ACCEPTED events for processes that called PMIX_Group_join */</div>
</div>
<div class=""><br class="">
</div>
<div class="">In the explicit invitation case, it is clear to me that there is a single leader process - the one that calls the PMIx_Group_invite function - and all others are told about the intent to create a group by PMIx via an event.</div>
<div class=""><br class="">
</div>
<div class="">In the implicit invitation case, it is less clear to me. All the processes are equivalent peers - until we impose special status by designating one of them as a leader. Do the non-leaders wait for the PMIX_GROUP_REQUESTED event and respond by
calling PMIX_Group_construct? If so, that looks like a duplicate of the explicit invite method.</div>
<div class=""><br class="">
</div>
<div class="">Perhaps there is a difference in which processes get notified of membership changes (and/or when they get notified)? If so, maybe that should be handled by supplying attributed to the explicit invitation method, rather than shoe-horning it into
the group construct method?</div>
<div class=""><br class="">
</div>
<div class="">Basically, I think I’m advocating that the PMIX_GROUP_INVITE_MEMBERS attribute (and associated functionality) should be removed from PMIX_Group_construct (leaving the rest of what it does alone) because that functionality is explicitly supported
by PMIX_Group_invite.</div>
<div class=""><br class="">
</div>
<div class="">What do you think? Have I got the wrong end of a stick here?</div>
<div class=""><br class="">
</div>
Cheers,</div>
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
Dan.</div>
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
—<br class="">
Dr Daniel Holmes PhD<br class="">
Applications Consultant in HPC Research<br class="">
<a href="mailto:d.holmes@epcc.ed.ac.uk" class="">d.holmes@epcc.ed.ac.uk</a><br class="">
Phone: +44 (0) 131 651 3465<br class="">
Mobile: +44 (0) 7940 524 088<br class="">
Address: Room 3415, JCMB, The King’s Buildings, Edinburgh, EH9 3FD</div>
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
—</div>
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.</div>
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
—</div>
</div>
<br class="">
<div class="">
<blockquote type="cite" class="">
<div class="">On 15 Aug 2018, at 19:00, Ralph H Castain <<a href="mailto:rhc@open-mpi.org" class="">rhc@open-mpi.org</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
I have updated the web page based on yesterday’s conversation: <a href="https://pmix.org/pmix-standard/pmix-groups/" class="">https://pmix.org/pmix-standard/pmix-groups/</a>
<div class=""><br class="">
</div>
<div class="">Please pass along comments - I tried to capture everything.</div>
<div class="">Ralph</div>
<div class=""><br class="">
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">On Aug 13, 2018, at 7:38 AM, Ralph H Castain <<a href="mailto:rhc@open-mpi.org" class="">rhc@open-mpi.org</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
I’ll be there!
<div class=""><br class="">
</div>
<div class="">Actually, after reading your note and pondering some more this morning, I believe we are in rather good agreement, though it may not seem that way. My “vision” of PMIx Group was that app or library developers that had use-cases where all participants
are known at time of construct would use PMIx_Group_construct which utilize current PMIx collective support. If only one process knew the entire group membership, then they could use the use the “invite members” attribute to bring them all together - still
using the current support to do so. All the resiliency things we discussed would still apply.</div>
<div class=""><br class="">
</div>
<div class="">For use-cases where the developer wants more control over the assembly algorithm, we have the “invite” and “join” APIs. This allows developers (such as perhaps MPI sessions) to devise whatever algorithm they want for assembling a group. You’ll
still get notifications of termination etc. - you just gain atomistic control over the construction procedure. One thing to remember: PMIx has no internode communication capability, so all invite/join messages flow through the host RM. Just something to keep
in mind as it impacts performance as well as burdening the RM’s messaging system.</div>
<div class=""><br class="">
</div>
<div class="">My concern really was to avoid pushing all those different construction algorithms into PMIx. I think it starts putting too much complexity into that code base - I’d rather provide the tools and let developers implement whatever they want.</div>
<div class=""><br class="">
</div>
<div class="">HTH</div>
<div class="">Ralph</div>
<div class=""><br class="">
</div>
<div class="">
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">On Aug 13, 2018, at 5:51 AM, HOLMES Daniel <<a href="mailto:d.holmes@epcc.ed.ac.uk" class="">d.holmes@epcc.ed.ac.uk</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
Hi Ralph,
<div class=""><br class="">
</div>
<div class="">
<blockquote type="cite" class="">
<div class="">I think we can agree to disagree on the case assignments - it isn’t really important to the API definition. :-)</div>
</blockquote>
Agreed :)</div>
<div class=""><br class="">
</div>
<div class="">
<blockquote type="cite" class="">
<div class="">I wouldn’t agree to make that the default as other use-cases really want a lightweight, fast operation</div>
</blockquote>
After the ability to solve thread-safety and multi-init problems in MPI, the next major point of modifying the MPI initialisation is to permit lightweight and fast initialisation. MPI is one of those 'other use-cases’ too! :)</div>
<div class=""><br class="">
</div>
<div class="">
<blockquote type="cite" class="">
<div class="">If someone is using the async construct method, then we have to know who to invite. Those processes don’t call “construct” - they call “join” to accept/decline the invitation.</div>
</blockquote>
<div class="">I am assuming that, by “async construct method”, you mean PMIX_GROUP_INVITE not PMIX_GROUP_CONSTRUCT_NB.</div>
<div class="">This is an interesting point. During a normal PMIX_GROUP_CONSTRUCT[_NB], every process should call the construct method and all is well with the group. If any process fails to call construct [in a timely manner | because it terminates], then the
leader gets notified and invites a replacement process. That process must be told information about the group - the information that the original would have supplied to the construct method must now be given to the replacement process. In your vision, every
process supplies an array containing every other process to the construct method, so all information is available (from any surviving participant process and/or from PMIX once construct has been called at least once, somewhere). In my vision, each process
supplies an array containing only partial connectivity information to the construct method. The connectivity information that should have been supplied by the errant process is lost because it never called construct to supply it. It could be guess at, by assuming
the connectivity graph is symmetric and querying all other processes for unfulfilled links to the missing process. That already seems fairly bad, but multiple failing/missing processes and/or non-symmetric graphs would be even worse (or impossible) to handle.
This (my suggestion of 'dist graph' group construct) needs more careful thought, and tighter restrictions on permitted behaviour, to avoid undecidable situations.</div>
<div class=""><br class="webkit-block-placeholder">
</div>
<div class="">I don’t want to give up on this, though, because it seems like the API as currently written down in the RFC is heading towards the existing build-down approach "create a heavy-weight fully-connected thing using complete information replicated
everywhere, then make a lighter-weight partially-connected thing from it”, which is what MPI Sessions is trying to get away from by investigating build-up and distributed (rather than replicated) topology information. I think a dist graph communicator creation
with no parent communicator, i.e. via MPI sessions, will not be able to use PMIx group construct, but will instead use only PMIx group invite and join.</div>
<div class=""><br class="">
</div>
<div class="">In a 2D cartesian mesh, each process would issue a group invite to each of its four neighbours (this may be one call to group_invite with an array containing four other processes). Each process should receive four invitations and respond with
four calls to group join. There is only one “group” being formed here, so hopefully this all uses the same group name/id. If a process declines one or more of the invitations or, equivalently, fails to call group_join the appropriate number times before either
a timeout or its termination (whichever happens first), then some of the inviting processes will be notified. Those processes might take remedial action or accept that the group is going to be malformed (w.r.t. the original request).</div>
<div class=""><br class="webkit-block-placeholder">
</div>
<div class="">Let’s chat about this on tomorrow’s call, if you make that time.</div>
<div class="">
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
<br class="Apple-interchange-newline">
Cheers,</div>
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
Dan.</div>
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
—<br class="">
Dr Daniel Holmes PhD<br class="">
Applications Consultant in HPC Research<br class="">
<a href="mailto:d.holmes@epcc.ed.ac.uk" class="">d.holmes@epcc.ed.ac.uk</a><br class="">
Phone: +44 (0) 131 651 3465<br class="">
Mobile: +44 (0) 7940 524 088<br class="">
Address: Room 3415, JCMB, The King’s Buildings, Edinburgh, EH9 3FD</div>
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
—</div>
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.</div>
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
—</div>
</div>
<br class="">
<div class="">
<blockquote type="cite" class="">
<div class="">On 10 Aug 2018, at 14:24, Ralph H Castain <<a href="mailto:rhc@open-mpi.org" class="">rhc@open-mpi.org</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">
<br class="Apple-interchange-newline">
<br class="">
<blockquote type="cite" class="">
<div class="">On Aug 10, 2018, at 4:09 AM, HOLMES Daniel <<a href="mailto:d.holmes@epcc.ed.ac.uk" class="">d.holmes@epcc.ed.ac.uk</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">
Hi Ralph,
<div class=""><br class="">
</div>
<div class="">In general, I agree with all of this. Two points:</div>
<div class=""><br class="">
</div>
<div class="">1) I think your assignment of special case and general case are the wrong way around. A fully connected graph (your normal/general case) is a special case of a general graph (your special case), which may be completely disconnected, partially
connected, disjointly connected, or fully connected. Requiring every process to specify every other process (your normal/general case) is therefore restricting their behaviour from the general graph (your special case) to a particular special case. This does
not need to be an issue: generalising a special-case API does not break backwards compatibility, in general. On the other hand, I’m not sure how much different either the specification’s wording or the implementation’s code would have to be in order to support
the general graph case as well as the fully connected special case. The general case definitely messes with the brain cells a lot more though.</div>
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
I think we can agree to disagree on the case assignments (as I look at them in terms of usage and not graph theory) - it isn’t really important to the API definition. :-)</div>
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">
<br class="">
</div>
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">
If someone is using the async construct method, then we have to know who to invite. Those processes don’t call “construct” - they call “join” to accept/decline the invitation. Thus, they don’t have the ability to extend the invitations. We could bring back
the PMIx_Group_invite API to allow that, but it created some nasty race conditions that make it very difficult to track completion. I suppose an alternative is to eliminate the “join” API and just have the invitee call “construct”, but that _really_ complicates
the implementation of the “construct” API. Definitely would at least require them to provide an attribute indicating they are responding to an invite, but that doesn’t do anything to resolve the race conditions.</div>
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">
<br class="">
</div>
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">
If someone is instead using the collective construct method, then it seems a little confusing if other procs keep extending it. This is basically equivalent to an MPI collective operation - you have to know the participants to efficiently handle it. If we say
that all procs don’t need to provide us with the list of participants, then implementation becomes a little more complex and we possibly lose scalability. I can see a mode where we treat a process that specifies the same group identifier but doesn’t include
a matching list of all participants as a potential sub-leader (akin to what you describe below) and do the collective over its sub-group, rolling things up as we go. However, that still has scalability impacts that other use-cases may not want to embrace (e.g.,
I now have to circulate the list of participants to check for completeness) - probably better to just define an attribute the procs would have to provide to request this use-case. I’m okay with allowing that as a special case that comes with the scalability
warning - if the user wants to do so, they pay the price.</div>
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">
<br class="">
<blockquote type="cite" class="">
<div class="">
<div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">
<div class=""><br class="">
</div>
<div class="">2a) Having a single leader may not be sufficient. I’m thinking intercommunicators, which have naturally have two leaders - one for the local group and one for the remote group. This statement immediately gives us a clue as to how to handle that
particular use-case: group construct two separate groups then worry about linking them together later. Perhaps, a third group containing just the two leaders? OK, maybe that works fine as is. Taking a different tack, unless the leader treats no news as good
news, all processes must notify the leader that they believe the operation is successful. For a really large group, wouldn’t the n-to-1 ‘notify the leader of local outcome’ gather/reduce operation cause a bottleneck at the leader? You’d want a hierarchy (e.g.
a tree or something like it) just like for normal gather/reduce implementations, i.e. different processes would perceive different leaders (who in turn may know they are just middle-management and there is a big boss that their underlings don’t know about).
Failure of a leaf would be notified to their parent/leader process, which can fix/swallow the problem or bubble it up to their parent/leader. Failure of a leader causes an election in their sub-tree(s)/children. This gives the possibility of a free choice
of the size of the containment domain - preferable to being told it’s all or nothing. A general graph naturally gives a structure but it would be hard, in general, to use that for containment - because each process can specify more than one other process,
which is their leader? Thus, each process could nominate a single process (probably one of the entries in their process array, but not necessarily) to be their leader, or a rule could be devised that figures out which process is leader for each process.</div>
<div class="">
<div class=""><br class="webkit-block-placeholder">
</div>
<div class="">
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
2b) Nominating/electing a (successor) leader could be a simple as “proceed in index order” (of the locally provided array). PMIx taking over by self-nominating leadership and canceling the group construct at the first sign of trouble seems a bit harsh.</div>
</div>
</div>
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
I’m not sure I understand your comment - when did PMIx propose to “take over” the leadership assignment?? All I proposed is that a leader exist (either the proc that started an async construct or some proc designated by the app) and that we notify it about
issues instead of broadcasting failure events to everyone. Again, this is for scalability - there are system and app-level costs to broadcasting events. If that process fails, then we would notify everyone that “leader failed” so the app can decide who should
take over that role. I’m okay with that broadcast as it represents an “exception” case and shouldn’t happen very often.</div>
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">
<br class="">
</div>
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">
If the proper attributes aren’t provided, then the user is telling us to terminate the construct on first failure. This isn’t PMIx making that decision - the user told us to do it. The notification/leader procedures only take affect if the user asks us to do
it. Other use-cases don’t need or want that overhead.</div>
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">
<br class="">
<blockquote type="cite" class="">
<div class="">
<div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">
<div class="">
<div class="">
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
<br class="">
</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
Conceptually, I think we can take your 1-to-n model, replicate it m times (possibly with different values of n in each replica), then add one more process that is the leader of all the m “leaders”. Rinse and repeat to obtain any tree-based hierarchy (partway
towards the general graph case). Now draw in “connections” from each process to all processes in their local array (which might be all other processes in the group, in the fully connected special case) and we have the general graph case. The tree/hierarchy
defines the flow for control messages, the arrays give the final connectivity topology.</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
<br class="">
</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
Calling this proposed function with the intention of creating a fully connected group (your normal case), requires each process to specify an array of processes (supported by the current API) containing all other processes. The first process in the array is
nominated leader - PMIx sends termination notices there. If it fails, then the process at index (last_leader_index +1) is nominated until there are no more processes to nominate, at which point PMIx self-nominates and cancels the operation. If a non-leader
fails, its current leader is notified. That process decides what to do. A cancel decision is broadcast to all other processes, an group join invitation is send to the invited process, a "don’t care, carry on” is silent.</div>
</div>
</div>
</div>
</div>
</blockquote>
<br class="">
I think you are making this way too complex and slow. I have no problem defining attributes to allow someone to specify such behavior, but I wouldn’t agree to make that the default as other use-cases really want a lightweight, fast operation. No issue with
declaring there is a default succession strategy, but we would again define that as an attribute so someone can choose to do what I described above (i.e., let the app decide the succession).</div>
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">
<br class="">
<blockquote type="cite" class="">
<div class="">
<div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">
<div class="">
<div class="">
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
When a non-leader joins the group, its leader is notified.</div>
</div>
</div>
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
<div class="">I assume you mean this as the non-leader accepting an invite and the leader being notified per the previously-defined “join” operation?</div>
</div>
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">
<br class="">
<blockquote type="cite" class="">
<div class="">
<div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">
<div class="">
<div class="">
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
When the leader has responses from all its non-leaders, it declares the group complete, which is broadcast to all non-leader processes.</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
<br class="">
</div>
</div>
</div>
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
<div class="">I’m a tad concerned about this one. If the leader determines the group is complete but fails before it generates the broadcast announcing it to everyone, then how do I explain that to the rest of the group? I’ll have to wrap my head around this
- I confess it bothers me. Seems like it opens a door for the group construct to “hang”, or forces us to notify everyone of everything which then impacts scalability.</div>
<div class=""><br class="">
</div>
<blockquote type="cite" class="">
<div class="">
<div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">
<div class="">
<div class="">
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
Calling this proposed function with the intention of creating a dist_graph (your special case), requires each process to specify an array of processes (supported by the current API) containing only its neighbour processes. The implementation sketched above
is the same for this case. A middle-manager plays the leader role to some processes and the non-leader role to exactly one other process (at a time).</div>
</div>
</div>
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
<div class="">As I said above, this can work for the async case but raises some issues for the collective case. I honestly don’t know how to scalably support what you are describing, but can allow it as a special case with a warning (not printed out, but just
in the text explaining that option)</div>
</div>
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">
<br class="">
<blockquote type="cite" class="">
<div class="">
<div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">
<div class="">
<div class="">
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
<br class="">
</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
In both use-cases, all processes must register for leader events and non-leader event, because they can be nominated/promoted at any time during the group construct by PMIx depending on what happens.</div>
</div>
</div>
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
Agreed - but we don’t necessarily have to send all of those events to everyone. We can make that an option for those willing to pay the price.</div>
<div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">
<br class="">
<blockquote type="cite" class="">
<div class="">
<div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">
<div class="">
<div class="">
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
<br class="Apple-interchange-newline">
Cheers,</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
Dan.</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
—<br class="">
Dr Daniel Holmes PhD<br class="">
Applications Consultant in HPC Research<br class="">
<a href="mailto:d.holmes@epcc.ed.ac.uk" class="">d.holmes@epcc.ed.ac.uk</a><br class="">
Phone: +44 (0) 131 651 3465<br class="">
Mobile: +44 (0) 7940 524 088<br class="">
Address: Room 3415, JCMB, The King’s Buildings, Edinburgh, EH9 3FD</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
—</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
—</div>
</div>
<br class="">
<div class="">
<blockquote type="cite" class="">
<div class="">On 9 Aug 2018, at 16:42, Ralph H Castain <<a href="mailto:rhc@open-mpi.org" class="">rhc@open-mpi.org</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">
<br class="Apple-interchange-newline">
<br class="">
<blockquote type="cite" class="">
<div class="">On Aug 9, 2018, at 7:53 AM, HOLMES Daniel <<a href="mailto:d.holmes@epcc.ed.ac.uk" class="">d.holmes@epcc.ed.ac.uk</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">
Hi Ralph,
<div class=""><br class="">
</div>
<div class="">I particularly like the self-fulfilling paradox at the end there: if the group construct fails then I won’t even call group construct, which guarantees that it’ll fail. Of course it’s not really a paradox because there are multiple processes involved
but I can see why your head is hurting.<br class="">
</div>
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
Yes, it is easy to rapidly get trapped in the weeds here.</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">
<br class="">
<blockquote type="cite" class="">
<div class="">
<div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">
<div class="">
<div class=""><br class="webkit-block-placeholder">
</div>
<div class="">Taking a further step back, scalable group construction (without a priori knowledge of membership and with the possibility of errors) requires a distributed graph specification of the membership and handling of errors. If each process only specifies
its neighbours, and only handles errors from its neighbours, then the event explosion can be contained to a neighbourhood.</div>
<div class=""><br class="">
</div>
<div class="">For example, if all processes specified the group membership as {back_proc, forward_proc} then a failed process will cause a NOTIFY_TERMINATION event in its back_proc and its forward_proc but no others, irrespective of the total size of the intended
group. Those two processes must then decide what to do, e.g. link to each other (form a ring with one fewer processes than originally intended) or invite a new process (hopefully the same one ;) which will fix the originally intended ring). A “burn the world”
decision from either of these two processes must then be propagated to all other participants (not necessarily around the ring, an OOB method works too). None of the other processes can have left/completed their group construct yet because not all members
have given their consent (implies: connect up local neighbourhood, wait for go signal, complete locally).</div>
<div class=""><br class="webkit-block-placeholder">
</div>
<div class="">
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
If the user chooses to specify all other processes at all processes, then they are explicitly requesting to be notified about all events for which they have registered from all other processes. This may cause explosions for large groups.</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
<br class="">
</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
Note, the user does not actually have to specify the intended size of the group when constructing it. The size and topology of the (connected portion of the) graph can be determined once the group exists, by doing collective operations using the group. Collective
operations can be done on the entire (connected portion of the) graph by each process interacting only with its neighbours. Having discovered information about processes outside of one’s local neighbourhood, direct communication can be done between any pair
or sub-group of processes. Thus, a fully connected MPI communicator can (if required) be built up from a MPI_DIST_GRAPH topology specification.</div>
</div>
</div>
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
I’m reluctant to customize PMIx Groups for just the graph use-case as other programming models also have interest (and might not fit that case) and it isn’t clear to me that everyone in MPI will want to base themselves on a graph-based group approach . What
we could do, though, is provide attributes to support the use model you describe and let the PMIx Group implementation deal with it as a special case.</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">
<br class="">
</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">
For the more general case, I think we just need to ensure that there is a clean failure path that ensures the user gets out of the operation (i.e., doesn’t hang or incorrectly think the group exists) when failures occur. We can provide failure notification
and recovery methods - we just need to acknowledge that these only really work in the (expected) case where failures are relatively rare events. After all, if lots of processes are failing or refusing to join the proposed group during a construct operation,
then you probably need to do some triage on your cluster and/or your application!</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">
<br class="">
</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">
If we take that approach, then we can limit notifications to the “leader” and let it decide what to do about it. If the leader fails, then we could just have PMIx automatically terminate group construction, issuing “cancel” events to all other participants.</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">
<br class="">
</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">
For flexibility, we can add an attribute that modifies that behavior and add a new event to notify other group participants of the leader’s failure (we know the leader already agreed to join the group!). We can then add an attribute by which a process can declare
itself the new leader, thereby causing an event to the rest of the group participants to update their leader assignment (this is implemented today as a broadcast and so scales relatively well). The new leader is the one that will decide what to do about giving
up on constructing the group.</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">
<br class="">
</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">
Since we cache notifications, we know that any “cancel” event received by a proc prior to registering for it will still be delivered. We then specify in the standard that procs should register for all group-related events prior to engaging in any PMIx Group
operations. This ensures that the app knows about the “cancel” before calling construct, and that procs which call the blocking form of construct prior to the event arriving will still have a mechanism for getting out of the operation.</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">
<br class="">
</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">
To make the registration easier, PMIx could add an ability to register for a “class” of events - e.g., register for the “group” class of events. This would provide for future compatibility should new group-related events get added. You currently have to specify
the events you want to know about.</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">
<br class="">
</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">
Make sense?</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">
Ralph</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">
<br class="">
<blockquote type="cite" class="">
<div class="">
<div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">
<div class="">
<div class="">
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
<br class="Apple-interchange-newline">
Cheers,</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
Dan.</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
—<br class="">
Dr Daniel Holmes PhD<br class="">
Applications Consultant in HPC Research<br class="">
<a href="mailto:d.holmes@epcc.ed.ac.uk" class="">d.holmes@epcc.ed.ac.uk</a><br class="">
Phone: +44 (0) 131 651 3465<br class="">
Mobile: +44 (0) 7940 524 088<br class="">
Address: Room 3415, JCMB, The King’s Buildings, Edinburgh, EH9 3FD</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
—</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
—</div>
</div>
<br class="">
<div class="">
<blockquote type="cite" class="">
<div class="">On 9 Aug 2018, at 15:00, Ralph H Castain <<a href="mailto:rhc@open-mpi.org" class="">rhc@open-mpi.org</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space;">So let’s step back for a moment at look at the failed construct problem in more depth. There are a couple of issues that come to mind.
<div class=""><br class="">
</div>
<div class="">First, do we really want to send the NOTIFY_TERMINATION event to all participants, or just to the leader? If the group under construction is large and we see a number of failures, then we could wind up in an event “storm”. If we alert only the
leader, then it begs the question: what if the leader is the one who fails? Do we need a mechanism by which someone else can declare themselves to be the “leader”?</div>
<div class=""><br class="">
</div>
<div class="">It isn’t too difficult for us to examine the returned results array from an event handler, though I’d want to (a) generalize it a bit so we can (b) limit how much of that we do to avoid making the event notification code explode with special cases.
If we go that route, which seems the right thing to me, then.we again have a couple of choices:</div>
<div class=""><br class="">
</div>
<div class="">* if the NOTIFY_TERMINATION event is only going to the leader, then we would provide the ability for the leader to declare “burn the world” and send a corresponding event to all participants. It does create a bit of a race condition as a remote
participating proc may get the event prior to calling Group_join and thus (a) has no idea what the event is talking about and (b) would have to retain the cancellation notice pending the call to Group_join so it could return an error. Doable - just a tad tricky
and difficult to test that race condition</div>
<div class=""><br class="">
</div>
<div class="">* if the event goes to all participants, then they could locally decide to abandon the group. If they have already joined, they could leave. If not, then they could simply decline the invite. Again, there are race conditions that could bite us
(particularly in multi-threaded apps), but maybe we resolve some of those by imposing requirements on the app.</div>
<div class=""><br class="">
</div>
<div class="">Now that was all based on the async construct - but what do we do about a blocking call to PMIx_Group_construct? Only think I can think of would be to provide an attribute in the results array that tells the PMIx library to “kick me out of the
current operation” and includes some tag(s) to indicate what operation it is talking about. We actually talked about that at some length during the last in-person PMIx devel meeting and came up with a scheme to support such a request (hasn’t been implemented
yet), so this could work. However, it again creates that race condition for procs that receive the TERMINATION event prior to calling “construct” as the operation hasn’t been initiated yet. I guess we could just put the burden on the app to realize that it
got a group_construct termination event and should therefore not call “construct” on that group?</div>
<div class=""><br class="">
</div>
<div class="">My head is beginning to hurt and I’ve probably confused folks anyway, so best to stop here and wait for input.</div>
<div class="">Ralph</div>
<div class=""><br class="">
</div>
<div class="">
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">On Aug 9, 2018, at 2:26 AM, HOLMES Daniel <<a href="mailto:d.holmes@epcc.ed.ac.uk" class="">d.holmes@epcc.ed.ac.uk</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">
Hi Ralph,
<div class=""><br class="">
</div>
<div class="">Thanks for the updates, I’ll take a look in a moment (after coffee).</div>
<div class=""><br class="">
</div>
<div class="">Technically, a cancel API in PMIX *could* be used with a blocking group construction by calling the cancel inside an event callback. For example, an app could keep a count of how many times PMIX_GROUP_NOTIFY_TERMINATION was called for this group
construction and invite a new process for the first N times but cancel the operation in response to any subsequent event(s).</div>
<div class="">
<div class=""><br class="webkit-block-placeholder">
</div>
<div class="">
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
<div class="">MPI is trying to get rid of its cancel API - we deprecated cancel for point-to-point send because it is fundamentally broken. However, cancel for point-to-point receive is still valid and useful. The problem with cancel is always the race between
the “all is OK, go ahead” and the “whoa, stop that” signals. With a receive in MPI, the choice of which will succeed (receive or cancel) is always a local decision and can therefore be made atomic and consistent. The choice between send and cancel is a distributed
decision, which cannot be atomic, and always suffers from a race. One way to avoid this race in a PMIX group cancel would be specify that it is only valid from within an event callback, e.g. by exposing it as an in-out/by-ref parameter in the callback itself
(not as a separate function call). PMIX could then examine this parameter (set by the user, during the event callback) when the callback returns. It’s a binary/boolean choice between “I handled it, carry on” and “I panicked, burn the world”.</div>
<div class=""><br class="">
</div>
<div class="">Is it useful? Well all systems have finite resources to use as replacements, so eventually this operation must fail. This type of cancel allows the application to choose when to give up based on how many things happened rather than just how many
seconds elapsed.</div>
<div class=""><br class="">
</div>
Cheers,</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
Dan.</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
—<br class="">
Dr Daniel Holmes PhD<br class="">
Applications Consultant in HPC Research<br class="">
<a href="mailto:d.holmes@epcc.ed.ac.uk" class="">d.holmes@epcc.ed.ac.uk</a><br class="">
Phone: +44 (0) 131 651 3465<br class="">
Mobile: +44 (0) 7940 524 088<br class="">
Address: Room 3415, JCMB, The King’s Buildings, Edinburgh, EH9 3FD</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
—</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
—</div>
</div>
<br class="">
<div class="">
<blockquote type="cite" class="">
<div class="">On 9 Aug 2018, at 03:54, Ralph H Castain <<a href="mailto:rhc@open-mpi.org" class="">rhc@open-mpi.org</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space;">I have updated the web page to reflect the comments. Let me know what you think and about the “cancel” API.<br class="">
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">On Aug 8, 2018, at 5:40 PM, Ralph H Castain <<a href="mailto:rhc@open-mpi.org" class="">rhc@open-mpi.org</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space;"><br class="">
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">On Aug 8, 2018, at 4:14 PM, HOLMES Daniel <<a href="mailto:d.holmes@epcc.ed.ac.uk" class="">d.holmes@epcc.ed.ac.uk</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">
Hi Ralph,
<div class=""><br class="">
</div>
<div class="">I was nodding whilst reading,</div>
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
Great!</div>
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">
<div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">
<div class="">until I got to PMIX_GROUP_LEAVE_REQUEST (Destruct procedure, PMIx_Groups).</div>
<div class=""><br class="">
</div>
<div class="">There are situations where a process will leave without the luxury of requesting and being patient first, e.g. faults (handled as termination, I know, bear with me a moment). If this event was instead PMIX_GROUP_LEFT, then processes would be written
to be able to cope with sudden exits in other processes. They have to be written like that anyway because of PMIX_GROUP_NOTIFY_TERMINATION. This event simply distinguishes "the process called PMIx_GROUP_LEAVE" from "the RM figured out a process stopped executing
(normally or abnormally)”. Is such a distinction useful?</div>
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
It’s a good point. My thought was to provide a “clean” way of dynamically leaving a group as opposed to just pulling out. On the other hand, we do need apps to be prepared for unexpected termination - so it isn’t clear that there is any real benefit. I have
no issue with making this change.</div>
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">
<div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">
<div class=""><br class="">
</div>
<div class="">
<div class="">In terms of outstanding/in-progress collective operations, just state that calling PMIX_GROUP_LEAVE is not allowed unless no such operations are in flight.</div>
</div>
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
I think putting a requirement that no collective op can be in progress is unenforceable, especially if you take the position that leaving is the same as unexpected termination - i.e., programs need to be written in a way that can adapt to terminations or departures.
We can provide an event indicating that departure occurred and user apps need to register for it and decide for themselves how to respond if in a user collective. The PMIx server can adjust any ongoing PMIx collective (e.g., PMIx_Fence) without user intervention.
We currently error-out from such operations, but we can provide an attribute to indicate the operation should “self-heal” and proceed to completion.</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">
<div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">
<div class="">
<div class="">The potential race between a process calling PMIX_GROUP_LEAVE and other process(es) in the group starting a collective operation should not happen in a well-defined program. Also, if PMIX_GROUP_NOTIFY_TERMINATION can state "collective operations
will be adjusted appropriately" then why can’t PMIX_GROUP_LEAVE say that too?</div>
</div>
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
No problem - it can certainly do so.</div>
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">
<div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">
<div class="">
<div class=""><br class="webkit-block-placeholder">
</div>
<div class="">
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
—</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
<br class="">
</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
For PMIX_GROUP_JOIN, can the leader process give up creation of the group and somehow tell PMIX to stop trying?</div>
</div>
</div>
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
Sure - it can do so by setting the PMIX_TIMEOUT attribute. We could provide a “cancel” API as well, but that would require that you used the non-blocking form of PMIx_Group_construct as otherwise there would be no way to call it. Would a “cancel” API be of
benefit?</div>
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">
<div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">
<div class="">
<div class="">
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
If so, then processes that accepted a join request should be informed that the group is never going to be constructed, i.e. they should stop waiting for the callback/return of the blocking function. Thus, "once the group has been completely constructed” could
be tempered with “or the group construction fails”.</div>
</div>
</div>
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
Agreed - will update.</div>
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">
<div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">
<div class="">
<div class="">
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
<br class="Apple-interchange-newline">
Cheers,</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
Dan.</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
—<br class="">
Dr Daniel Holmes PhD<br class="">
Applications Consultant in HPC Research<br class="">
<a href="mailto:d.holmes@epcc.ed.ac.uk" class="">d.holmes@epcc.ed.ac.uk</a><br class="">
Phone: +44 (0) 131 651 3465<br class="">
Mobile: +44 (0) 7940 524 088<br class="">
Address: Room 3415, JCMB, The King’s Buildings, Edinburgh, EH9 3FD</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
—</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
—</div>
</div>
<br class="">
<div class="">
<blockquote type="cite" class="">
<div class="">On 8 Aug 2018, at 23:04, Ralph H Castain <<a href="mailto:rhc@open-mpi.org" class="">rhc@open-mpi.org</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div class="" style="word-wrap: break-word; -webkit-nbsp-mode: space;">Hi folks
<div class=""><br class="">
</div>
<div class="">I have updated the PMIx Group web page to capture the discussion of the prior meeting plus some subsequent thoughts:</div>
<div class=""><br class="">
</div>
<div class=""><a href="https://pmix.org/pmix-standard/pmix-groups/" class="">https://pmix.org/pmix-standard/pmix-groups/</a></div>
<div class=""><br class="">
</div>
<div class="">I’ll try to put some initial implementation behind it before the meeting, so please feel free to chime up with any thoughts.</div>
<div class="">Ralph</div>
<div class=""><br class="">
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">On Aug 6, 2018, at 11:38 AM, Ralph H Castain <<a href="mailto:rhc@open-mpi.org" class="">rhc@open-mpi.org</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div dir="auto" class="">Looks like I can free some time up this week for groups - will try to update later this week
<div class=""><br class="">
<div class="">Sent from my iPhone</div>
<div class=""><br class="">
On Aug 6, 2018, at 11:05 AM, HOLMES Daniel <<a href="mailto:d.holmes@epcc.ed.ac.uk" class="">d.holmes@epcc.ed.ac.uk</a>> wrote:to<br class="">
<br class="">
</div>
<blockquote type="cite" class="">
<div class="">Hi all,
<div class=""><br class="">
</div>
<div class="">The next meeting for the Sessions WG will be *Tuesday 14th Aug 2018* at 12pm Eastern US time.</div>
<div class=""><br class="">
</div>
<div class="">Note the change of day and time. This is a one-off change due to vacation time.</div>
<div class=""><br class="">
</div>
<div class="">The connection details for the call will be sent out on this list nearer the time.</div>
<div class=""></div>
<div class="">
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
<br class="">
</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
Cheers,</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
Dan.</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
—<br class="">
Dr Daniel Holmes PhD<br class="">
Applications Consultant in HPC Research<br class="">
<a href="mailto:d.holmes@epcc.ed.ac.uk" class="">d.holmes@epcc.ed.ac.uk</a><br class="">
Phone: +44 (0) 131 651 3465<br class="">
Mobile: +44 (0) 7940 524 088<br class="">
Address: Room 3415, JCMB, The King’s Buildings, Edinburgh, EH9 3FD</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
—</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.</div>
<div class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space;">
—</div>
</div>
<br class="">
</div>
</blockquote>
<blockquote type="cite" class="">
<div class=""><span class="">The University of Edinburgh is a charitable body, registered in</span><br class="">
<span class="">Scotland, with registration number SC005336.</span><br class="">
</div>
</blockquote>
<blockquote type="cite" class="">
<div class=""><span class="">_______________________________________________</span><br class="">
<span class="">mpiwg-sessions mailing list</span><br class="">
<span class=""><a href="mailto:mpiwg-sessions@lists.mpi-forum.org" class="">mpiwg-sessions@lists.mpi-forum.org</a></span><br class="">
<span class=""><a href="https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions" class="">https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions</a></span><br class="">
</div>
</blockquote>
</div>
</div>
</div>
</blockquote>
</div>
<br class="">
</div>
</div>
_______________________________________________<br class="">
mpiwg-sessions mailing list<br class="">
<a href="mailto:mpiwg-sessions@lists.mpi-forum.org" class="">mpiwg-sessions@lists.mpi-forum.org</a><br class="">
<a href="https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions" class="">https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions</a><br class="">
</div>
</blockquote>
</div>
<br class="">
</div>
</div>
The University of Edinburgh is a charitable body, registered in<br class="">
Scotland, with registration number SC005336.<br class="">
_______________________________________________<br class="">
mpiwg-sessions mailing list<br class="">
<a href="mailto:mpiwg-sessions@lists.mpi-forum.org" class="">mpiwg-sessions@lists.mpi-forum.org</a><br class="">
<a href="https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions" class="">https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions</a><br class="">
</div>
</blockquote>
</div>
<br class="">
</div>
_______________________________________________<br class="">
mpiwg-sessions mailing list<br class="">
<a href="mailto:mpiwg-sessions@lists.mpi-forum.org" class="">mpiwg-sessions@lists.mpi-forum.org</a><br class="">
<a href="https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions" class="">https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions</a><br class="">
</div>
</blockquote>
</div>
<br class="">
</div>
_______________________________________________<br class="">
mpiwg-sessions mailing list<br class="">
<a href="mailto:mpiwg-sessions@lists.mpi-forum.org" class="">mpiwg-sessions@lists.mpi-forum.org</a><br class="">
<a href="https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions" class="">https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions</a><br class="">
</div>
</blockquote>
</div>
<br class="">
</div>
</div>
The University of Edinburgh is a charitable body, registered in<br class="">
Scotland, with registration number SC005336.<br class="">
_______________________________________________<br class="">
mpiwg-sessions mailing list<br class="">
<a href="mailto:mpiwg-sessions@lists.mpi-forum.org" class="">mpiwg-sessions@lists.mpi-forum.org</a><br class="">
<a href="https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions" class="">https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions</a><br class="">
</div>
</blockquote>
</div>
<br class="">
</div>
</div>
_______________________________________________<br class="">
mpiwg-sessions mailing list<br class="">
<a href="mailto:mpiwg-sessions@lists.mpi-forum.org" class="">mpiwg-sessions@lists.mpi-forum.org</a><br class="">
<a href="https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions" class="">https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions</a><br class="">
</div>
</blockquote>
</div>
<br class="">
</div>
</div>
The University of Edinburgh is a charitable body, registered in<br class="">
Scotland, with registration number SC005336.<br class="">
_______________________________________________<br class="">
mpiwg-sessions mailing list<br class="">
<a href="mailto:mpiwg-sessions@lists.mpi-forum.org" class="">mpiwg-sessions@lists.mpi-forum.org</a><br class="">
<a href="https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions" class="">https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions</a><br class="">
</div>
</blockquote>
</div>
<br class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">
<span class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;">_______________________________________________</span><br class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">
<span class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;">mpiwg-sessions
mailing list</span><br class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">
<a href="mailto:mpiwg-sessions@lists.mpi-forum.org" class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">mpiwg-sessions@lists.mpi-forum.org</a><br class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">
<a href="https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions" class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions</a></div>
</blockquote>
</div>
<br class="">
</div>
</div>
The University of Edinburgh is a charitable body, registered in<br class="">
Scotland, with registration number SC005336.<br class="">
_______________________________________________<br class="">
mpiwg-sessions mailing list<br class="">
<a href="mailto:mpiwg-sessions@lists.mpi-forum.org" class="">mpiwg-sessions@lists.mpi-forum.org</a><br class="">
<a href="https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions" class="">https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions</a><br class="">
</div>
</blockquote>
</div>
<br class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">
<span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">_______________________________________________</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">
<span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">mpiwg-sessions
mailing list</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">
<a href="mailto:mpiwg-sessions@lists.mpi-forum.org" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">mpiwg-sessions@lists.mpi-forum.org</a><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">
<a href="https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions</a></div>
</blockquote>
</div>
<br class="">
</div>
</div>
The University of Edinburgh is a charitable body, registered in<br class="">
Scotland, with registration number SC005336.<br class="">
_______________________________________________<br class="">
mpiwg-sessions mailing list<br class="">
<a href="mailto:mpiwg-sessions@lists.mpi-forum.org" class="">mpiwg-sessions@lists.mpi-forum.org</a><br class="">
<a href="https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions" class="">https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions</a><br class="">
</div>
</blockquote>
</div>
<br class="">
</div>
</div>
_______________________________________________<br class="">
mpiwg-sessions mailing list<br class="">
<a href="mailto:mpiwg-sessions@lists.mpi-forum.org" class="">mpiwg-sessions@lists.mpi-forum.org</a><br class="">
<a href="https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions" class="">https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions</a><br class="">
</div>
</blockquote>
</div>
<br class="">
</div>
</div>
_______________________________________________<br class="">
mpiwg-sessions mailing list<br class="">
<a href="mailto:mpiwg-sessions@lists.mpi-forum.org" class="">mpiwg-sessions@lists.mpi-forum.org</a><br class="">
<a href="https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions" class="">https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions</a><br class="">
</div>
</blockquote>
</div>
<br class="">
</div>
</div>
The University of Edinburgh is a charitable body, registered in<br class="">
Scotland, with registration number SC005336.<br class="">
_______________________________________________<br class="">
mpiwg-sessions mailing list<br class="">
<a href="mailto:mpiwg-sessions@lists.mpi-forum.org" class="">mpiwg-sessions@lists.mpi-forum.org</a><br class="">
<a href="https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions" class="">https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions</a><br class="">
</div>
</blockquote>
</div>
<br class="">
</div>
</div>
_______________________________________________<br class="">
mpiwg-sessions mailing list<br class="">
<a href="mailto:mpiwg-sessions@lists.mpi-forum.org" class="">mpiwg-sessions@lists.mpi-forum.org</a><br class="">
https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions<br class="">
</div>
</blockquote>
</div>
<br class="">
</div>
</body>
</html>