[mpiwg-sessions] Better handling of dynamic mixed with sessions without world
d.holmes at epcc.ed.ac.uk
Fri Jan 22 06:06:27 CST 2021
I just reviewed PR 506 from Rolf. It adds predefined communicator names to Appendix A. One of those is “MPI_COMM_PARENT”. It is a correct change and only indirectly relevant because it is what set off an interesting train of thought, which I summarise below.
This suggests that we missed a trick when proposing Sessions v1.0 - we should have mandated the existence and meaning for an additional process set name: "mpi://MPI_COMM_PARENT” (note the absence of “GET” in that name to match the default string name assigned by MPI to the communicator returned by the MPI_COMM_GET_PARENT procedure).
This process set shall contain all the processes that would be in the communicator returned by MPI_COMM_GET_PARENT, either zero processes (if you would get MPI_COMM_NULL) or the union of the local and remote groups (if you would get an inter-communicator).
With this addition, MPI processes that are spawned can regain all of the functionality of the Dynamic Model without calling MPI_INIT[_THREAD], i.e. they can create the inter-communicator that they would have got from MPI_COMM_GET_PARENT (which requires a prior call to MPI_INIT because World Model [ED: please check]).
Create a session
Create an MPI_Group, groupParent, from the process set named "mpi://MPI_COMM_PARENT”
Create an MPI_Group, groupWorld, from the process set named “mpi://MPI_COMM_WORLD”
If (groupParent if MPI_GROUP_NULL) then
This process was not spawned by other MPI processes
Create groupLocal from the intersection of groupWorld and groupParent
Create groupRemote from the subtraction of groupWorld from groupParent
Create commParent using MPI_COMM_CREATE_FROM_GROUPS with groupLocal and groupRemote
No usage of the World Model, but we now have (a duplicate of) the inter-communicator that would have been returned by MPI_COMM_GET_PARENT if we had been permitted to call it.
For each MPI process, there is a unique “parent” process set by this definition. Multiple components doing this pseudo-code would get duplicate communicators with their own life-cycle. This contrasts with the current way whereby freeing or disconnecting any communicator handle provided by MPI_COMM_GET_PARENT makes all other such handles stale, which is another instance of global state in MPI baked into the standardised interface and another anathema to any attempt at isolation.
Of course, there is already no reason why the parent processes could not have used the Sessions Model to call MPI_COMM_SPAWN (passing in a `comm` that was derived from the Sessions Model), so this completes the picture of how the Dynamic Model can be bolted on successfully to either the World Model (refer to MPI-3.1 and prior) or the Sessions Model (refer to MPI-4.1, which will include this addition) or a mixture (refer to MPI-4.0, without this addition).
[EDIT: if we decide that MPI_COMM_GET_PARENT can already be called without/before/after the World Model, then we should add it to Table 11.1 and write some text about it in §11.8.2. In that scenario, the above provides an implementation route for that procedure using only the Sessions Model underneath - in a similar way to the observation that it is possible to implement MPI_INIT[_THREAD] using only the Sessions Model underneath.]
Dr Daniel Holmes PhD
Architect (HPC Research)
d.holmes at epcc.ed.ac.uk<mailto:d.holmes at epcc.ed.ac.uk>
Phone: +44 (0) 131 651 3465
Mobile: +44 (0) 7940 524 088
Address: Room 2.09, Bayes Centre, 47 Potterrow, Central Area, Edinburgh, EH8 9BT
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the mpiwg-sessions