[mpiwg-sessions] [EXTERNAL] Better handling of dynamic mixed with sessions without world

Pritchard Jr., Howard howardp at lanl.gov
Fri Jan 22 10:01:32 CST 2021


HI Dan,

Ah yes now I recall better.    Let’s discuss on Monday.

Howard

From: HOLMES Daniel <d.holmes at epcc.ed.ac.uk>
Date: Friday, January 22, 2021 at 8:56 AM
To: "Pritchard Jr., Howard" <howardp at lanl.gov>
Cc: "schulzm at in.tum.de" <schulzm at in.tum.de>, MPI Sessions working group <mpiwg-sessions at lists.mpi-forum.org>
Subject: Re: [EXTERNAL] Better handling of dynamic mixed with sessions without world

Hi Howard,

I (half-)remember the discussions you mention - they mostly focused on the spawning side, rather than the being spawned side, but there was discussion of which process set(s) would need to exist for the parents and children to find each other. The waters were muddier because we were considering a replacement for MPI_Comm_spawn (called MPI_Exec) that did not return/output an inter-communicator but expected the user to use the process set->group->comm method to find the processes that it just created. Thus, we thought about needing multiple standardised names - whereas for this simpler approach we only need one because it is guaranteed to have unique meaning independent of the number of times spawn is called (because only the spawned processes see it, not the processes that executed spawn).

This proposal would not exclude or prevent a DPM-2.0 approach that followed up on the ideas in the old slide deck.

There is no chance of getting this into MPI-4.0, although I do wish I’d thought of this earlier, i.e. early enough to have included it at the same time as the rest of SPM-1.0 so we should push on this for the MPI-4.1 timeframe.

Cheers,
Dan.
—
Dr Daniel Holmes PhD
Architect (HPC Research)
d.holmes at epcc.ed.ac.uk<mailto:d.holmes at epcc.ed.ac.uk>
Phone: +44 (0) 131 651 3465
Mobile: +44 (0) 7940 524 088
Address: Room 2.09, Bayes Centre, 47 Potterrow, Central Area, Edinburgh, EH8 9BT
—
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
—


On 22 Jan 2021, at 15:38, Pritchard Jr., Howard <howardp at lanl.gov<mailto:howardp at lanl.gov>> wrote:

This email was sent to you by someone outside the University.
You should only click on links or attachments if you are certain that the email is genuine and the content is safe.

Hi Dan,

I think this is a good idea.  I’m not sure we entirely missed it though for sessions 1.0.  I vaguely recall discussions about something like this in the context of mpi_exec – but it was quite a while ago.  Maybe some problems/limitations were found with this process set approach to capturing info to support spawn functionality.  We should probably double check the old slide deck.

This would be a 4.1 proposal right?

Howard

From: HOLMES Daniel <d.holmes at epcc.ed.ac.uk<mailto:d.holmes at epcc.ed.ac.uk>>
Date: Friday, January 22, 2021 at 5:06 AM
To: "Pritchard Jr., Howard" <howardp at lanl.gov<mailto:howardp at lanl.gov>>, Martin Schulz <schulzm at in.tum.de<mailto:schulzm at in.tum.de>>
Cc: MPI Sessions working group <mpiwg-sessions at lists.mpi-forum.org<mailto:mpiwg-sessions at lists.mpi-forum.org>>
Subject: [EXTERNAL] Better handling of dynamic mixed with sessions without world

Hi all,

I just reviewed PR 506 from Rolf. It adds predefined communicator names to Appendix A. One of those is “MPI_COMM_PARENT”. It is a correct change and only indirectly relevant because it is what set off an interesting train of thought, which I summarise below.

This suggests that we missed a trick when proposing Sessions v1.0 - we should have mandated the existence and meaning for an additional process set name: "mpi://MPI_COMM_PARENT” (note the absence of “GET” in that name to match the default string name assigned by MPI to the communicator returned by the MPI_COMM_GET_PARENT procedure).

This process set shall contain all the processes that would be in the communicator returned by MPI_COMM_GET_PARENT, either zero processes (if you would get MPI_COMM_NULL) or the union of the local and remote groups (if you would get an inter-communicator).

With this addition, MPI processes that are spawned can regain all of the functionality of the Dynamic Model without calling MPI_INIT[_THREAD], i.e. they can create the inter-communicator that they would have got from MPI_COMM_GET_PARENT (which requires a prior call to MPI_INIT because World Model [ED: please check]).

```pseudo-code
Create a session
Create an MPI_Group, groupParent, from the process set named "mpi://MPI_COMM_PARENT”
Create an MPI_Group, groupWorld, from the process set named “mpi://MPI_COMM_WORLD”
If (groupParent if MPI_GROUP_NULL) then
  This process was not spawned by other MPI processes
Else
  Create groupLocal from the intersection of groupWorld and groupParent
  Create groupRemote from the subtraction of groupWorld from groupParent
  Create commParent using MPI_COMM_CREATE_FROM_GROUPS with groupLocal and groupRemote
```

No usage of the World Model, but we now have (a duplicate of) the inter-communicator that would have been returned by MPI_COMM_GET_PARENT if we had been permitted to call it.

For each MPI process, there is a unique “parent” process set by this definition. Multiple components doing this pseudo-code would get duplicate communicators with their own life-cycle. This contrasts with the current way whereby freeing or disconnecting any communicator handle provided by MPI_COMM_GET_PARENT makes all other such handles stale, which is another instance of global state in MPI baked into the standardised interface and another anathema to any attempt at isolation.

Of course, there is already no reason why the parent processes could not have used the Sessions Model to call MPI_COMM_SPAWN (passing in a `comm` that was derived from the Sessions Model), so this completes the picture of how the Dynamic Model can be bolted on successfully to either the World Model (refer to MPI-3.1 and prior) or the Sessions Model (refer to MPI-4.1, which will include this addition) or a mixture (refer to MPI-4.0, without this addition).

[EDIT: if we decide that MPI_COMM_GET_PARENT can already be called without/before/after the World Model, then we should add it to Table 11.1 and write some text about it in §11.8.2. In that scenario, the above provides an implementation route for that procedure using only the Sessions Model underneath - in a similar way to the observation that it is possible to implement MPI_INIT[_THREAD] using only the Sessions Model underneath.]

Cheers,
Dan.
—
Dr Daniel Holmes PhD
Architect (HPC Research)
d.holmes at epcc.ed.ac.uk<mailto:d.holmes at epcc.ed.ac.uk>
Phone: +44 (0) 131 651 3465
Mobile: +44 (0) 7940 524 088
Address: Room 2.09, Bayes Centre, 47 Potterrow, Central Area, Edinburgh, EH8 9BT
—
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
—

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-sessions/attachments/20210122/67a2632a/attachment-0001.html>


More information about the mpiwg-sessions mailing list