[mpiwg-sessions] Next meeting - *Tuesday* next week - midday Eastern US time

HOLMES Daniel d.holmes at epcc.ed.ac.uk
Wed Aug 8 18:14:59 CDT 2018


Hi Ralph,

I was nodding whilst reading, until I got to PMIX_GROUP_LEAVE_REQUEST (Destruct procedure, PMIx_Groups).

There are situations where a process will leave without the luxury of requesting and being patient first, e.g. faults (handled as termination, I know, bear with me a moment). If this event was instead PMIX_GROUP_LEFT, then processes would be written to be able to cope with sudden exits in other processes. They have to be written like that anyway because of PMIX_GROUP_NOTIFY_TERMINATION. This event simply distinguishes "the process called PMIx_GROUP_LEAVE" from "the RM figured out a process stopped executing (normally or abnormally)”. Is such a distinction useful?

In terms of outstanding/in-progress collective operations, just state that calling PMIX_GROUP_LEAVE is not allowed unless no such operations are in flight. The potential race between a process calling PMIX_GROUP_LEAVE and other process(es) in the group starting a collective operation should not happen in a well-defined program. Also, if PMIX_GROUP_NOTIFY_TERMINATION can state "collective operations will be adjusted appropriately" then why can’t PMIX_GROUP_LEAVE say that too?

—

For PMIX_GROUP_JOIN, can the leader process give up creation of the group and somehow tell PMIX to stop trying? If so, then processes that accepted a join request should be informed that the group is never going to be constructed, i.e. they should stop waiting for the callback/return of the blocking function. Thus, "once the group has been completely constructed” could be tempered with “or the group construction fails”.

Cheers,
Dan.
—
Dr Daniel Holmes PhD
Applications Consultant in HPC Research
d.holmes at epcc.ed.ac.uk<mailto:d.holmes at epcc.ed.ac.uk>
Phone: +44 (0) 131 651 3465
Mobile: +44 (0) 7940 524 088
Address: Room 3415, JCMB, The King’s Buildings, Edinburgh, EH9 3FD
—
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
—

On 8 Aug 2018, at 23:04, Ralph H Castain <rhc at open-mpi.org<mailto:rhc at open-mpi.org>> wrote:

Hi folks

I have updated the PMIx Group web page to capture the discussion of the prior meeting plus some subsequent thoughts:

https://pmix.org/pmix-standard/pmix-groups/

I’ll try to put some initial implementation behind it before the meeting, so please feel free to chime up with any thoughts.
Ralph


On Aug 6, 2018, at 11:38 AM, Ralph H Castain <rhc at open-mpi.org<mailto:rhc at open-mpi.org>> wrote:

Looks like I can free some time up this week for groups - will try to update later this week

Sent from my iPhone

On Aug 6, 2018, at 11:05 AM, HOLMES Daniel <d.holmes at epcc.ed.ac.uk<mailto:d.holmes at epcc.ed.ac.uk>> wrote:to

Hi all,

The next meeting for the Sessions WG will be *Tuesday 14th Aug 2018* at 12pm Eastern US time.

Note the change of day and time. This is a one-off change due to vacation time.

The connection details for the call will be sent out on this list nearer the time.

Cheers,
Dan.
—
Dr Daniel Holmes PhD
Applications Consultant in HPC Research
d.holmes at epcc.ed.ac.uk<mailto:d.holmes at epcc.ed.ac.uk>
Phone: +44 (0) 131 651 3465
Mobile: +44 (0) 7940 524 088
Address: Room 3415, JCMB, The King’s Buildings, Edinburgh, EH9 3FD
—
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
—

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
_______________________________________________
mpiwg-sessions mailing list
mpiwg-sessions at lists.mpi-forum.org<mailto:mpiwg-sessions at lists.mpi-forum.org>
https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions

_______________________________________________
mpiwg-sessions mailing list
mpiwg-sessions at lists.mpi-forum.org<mailto:mpiwg-sessions at lists.mpi-forum.org>
https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-sessions/attachments/20180808/491dc9ec/attachment-0001.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-sessions/attachments/20180808/491dc9ec/attachment-0001.ksh>


More information about the mpiwg-sessions mailing list