[mpiwg-sessions] [EXTERNAL] RE: MPI_Session_init semantics question/poll

Holmes, Daniel John daniel.john.holmes at intel.com
Wed Jan 4 14:17:15 CST 2023


Hi Martin,

MPI is not PVM. We do not wait to see which/how many processes start and join the group/process set before deciding on the membership of the group/process set. The names and the membership of all (built-in/predefined) process sets are known a priori without coordination during the initialisation procedure call(s). Deviation from that membership (e.g. a process fails to start or fails to join up with the other processes) is a fault, which will cause a failure (e.g. a collective operation cannot complete), which will manifest as an error. The process set still exists and a group can still be formed from it; the communicator creation procedure that uses that group will raise an error.

For scenarios/implementations where additional process sets “appear” during the execution, those new process sets might not appear until all involved processes will see the same new set name (depending on what the implementation can support); that might mean every involved process will have to have done some progress after the process set was created internally before any process will expose it to the user via MPI calls. That delay must never happen for the built-in/predefined process sets, so we have no conflict or difficulty.

Best wishes,
Dan.

From: Martin Schulz <schulzm at in.tum.de>
Sent: 04 January 2023 19:43
To: MPI Sessions working group <mpiwg-sessions at lists.mpi-forum.org>; Holmes, Daniel John <daniel.john.holmes at intel.com>
Cc: Pritchard Jr., Howard <howardp at lanl.gov>
Subject: Re: [mpiwg-sessions] [EXTERNAL] RE: MPI_Session_init semantics question/poll

Hi all,

I agree with this interpretation – I always thought that was the original intent; non-local work should be able to be push off to the first communicator creation.

The question about it being an operation and/or a local call is interesting, though – I tend to also see it the same as Dan, but is there a scenario in implementations that may require some kind of progress in other MPI processes (e.g., to internally synchronize on process sets)? If so, would we have to classify at least some calls (perhaps only the query of the process sets) as (local) operations so we can mandate progress? Or maybe “have to” is to harsh, but it would implementations to be more efficient?

Martin


--
Prof. Dr. Martin Schulz, Chair of Computer Architecture and Parallel Systems
Department of Informatics, TU-Munich, Boltzmannstraße 3, D-85748 Garching
Member of the Board of Directors at the Leibniz Supercomputing Centre (LRZ)
Email: schulzm at in.tum.de<mailto:schulzm at in.tum.de>


From: mpiwg-sessions <mpiwg-sessions-bounces at lists.mpi-forum.org<mailto:mpiwg-sessions-bounces at lists.mpi-forum.org>> on behalf of "Pritchard Jr., Howard via mpiwg-sessions" <mpiwg-sessions at lists.mpi-forum.org<mailto:mpiwg-sessions at lists.mpi-forum.org>>
Reply to: MPI Sessions working group <mpiwg-sessions at lists.mpi-forum.org<mailto:mpiwg-sessions at lists.mpi-forum.org>>
Date: Wednesday, 4. January 2023 at 09:30
To: "Holmes, Daniel John" <daniel.john.holmes at intel.com<mailto:daniel.john.holmes at intel.com>>, MPI Sessions working group <mpiwg-sessions at lists.mpi-forum.org<mailto:mpiwg-sessions at lists.mpi-forum.org>>
Cc: "Pritchard Jr., Howard" <howardp at lanl.gov<mailto:howardp at lanl.gov>>
Subject: Re: [mpiwg-sessions] [EXTERNAL] RE: MPI_Session_init semantics question/poll

HI Dan,

Yes that was my interpretation as well.

We can discuss at our next meeting 1/9/23 if there’s time.

Howard


From: "Holmes, Daniel John" <daniel.john.holmes at intel.com<mailto:daniel.john.holmes at intel.com>>
Date: Wednesday, January 4, 2023 at 12:05 PM
To: MPI Sessions working group <mpiwg-sessions at lists.mpi-forum.org<mailto:mpiwg-sessions at lists.mpi-forum.org>>
Cc: "Pritchard Jr., Howard" <howardp at lanl.gov<mailto:howardp at lanl.gov>>
Subject: [EXTERNAL] RE: MPI_Session_init semantics question/poll

Hi Howard,

It was always intended that MPI_Session_init was a local procedure. In fact, “initialise a session” is not even an MPI operation, so it doesn’t make sense for it to be expressed via a nonlocal procedure.

Further, it was intended that the nonlocal portion of the work done by MPI_Init that is eventually needed in the pure sessions pattern would be done during the first nonlocal procedure call in that pattern, as follows:

MPI_Session_init // local – PMIx fence prohibited
MPI_Group_from_pset // local – PMIx fence prohibited
MPI_Comm_create_from_group // nonlocal – PMIx fence permitted, if needed

The nonlocal work should be unnecessary until the first nonlocal procedure call, so this should all work out fine (modulo some refactoring/debugging).

Best wishes,
Dan.

From: mpiwg-sessions <mpiwg-sessions-bounces at lists.mpi-forum.org<mailto:mpiwg-sessions-bounces at lists.mpi-forum.org>> On Behalf Of Pritchard Jr., Howard via mpiwg-sessions
Sent: 04 January 2023 18:32
To: MPI Sessions working group <mpiwg-sessions at lists.mpi-forum.org<mailto:mpiwg-sessions at lists.mpi-forum.org>>
Cc: Pritchard Jr., Howard <howardp at lanl.gov<mailto:howardp at lanl.gov>>
Subject: [mpiwg-sessions] MPI_Session_init semantics question/poll

Hi All,

First, Happy New Year!

I’ve got a question about the semantics of MPI_Session_init.  In particular, I’d be interested in knowing  people’s opinion on whether this function is nonlocal or local.
We don’t have any text in the current version of the standard that states whether or not MPI_Session_init is a nonlocal operation.

I’m considering options for handling this issue:  https://github.com/open-mpi/ompi/issues/11166<https://urldefense.com/v3/__https:/github.com/open-mpi/ompi/issues/11166__;!!Bt8fGhp8LhKGRg!CKPfJnVxgJ8KyXfu93oiW-q0IPGmpAtrBZo2vO6bAElAdqtSv6Xv6G48O6Hk2sxr3csENDhZPwUW0mA8_fi98l7TQUw$> .  It turns out that the way to properly resolve this issue depends on whether or not MPI_Session_init has local or nonlocal semantics.

I had been working under the assumption that we had intended session initialization to be a local function, but considering how to resolve issue 11166 made me begin to question this assumption.

Thanks for any ideas,

Howard


—

[signature_61897647]
Howard Pritchard
Research Scientist
HPC-ENV

Los Alamos National Laboratory
howardp at lanl.gov<mailto:howardp at lanl.gov>

[signature_1293224934]<https://urldefense.com/v3/__https:/www.instagram.com/losalamosnatlab/__;!!Bt8fGhp8LhKGRg!CKPfJnVxgJ8KyXfu93oiW-q0IPGmpAtrBZo2vO6bAElAdqtSv6Xv6G48O6Hk2sxr3csENDhZPwUW0mA8_fi9Rgwox5A$>[signature_2498822630]<https://urldefense.com/v3/__https:/twitter.com/LosAlamosNatLab__;!!Bt8fGhp8LhKGRg!CKPfJnVxgJ8KyXfu93oiW-q0IPGmpAtrBZo2vO6bAElAdqtSv6Xv6G48O6Hk2sxr3csENDhZPwUW0mA8_fi9vR2-KGc$>[signature_1283032776]<https://urldefense.com/v3/__https:/www.linkedin.com/company/los-alamos-national-laboratory/__;!!Bt8fGhp8LhKGRg!CKPfJnVxgJ8KyXfu93oiW-q0IPGmpAtrBZo2vO6bAElAdqtSv6Xv6G48O6Hk2sxr3csENDhZPwUW0mA8_fi9_F2cjUc$>[signature_3959178607]<https://urldefense.com/v3/__https:/www.facebook.com/LosAlamosNationalLab/__;!!Bt8fGhp8LhKGRg!CKPfJnVxgJ8KyXfu93oiW-q0IPGmpAtrBZo2vO6bAElAdqtSv6Xv6G48O6Hk2sxr3csENDhZPwUW0mA8_fi95RavtTU$>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-sessions/attachments/20230104/200d4dd2/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 4351 bytes
Desc: image001.png
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-sessions/attachments/20230104/200d4dd2/attachment-0005.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 1982 bytes
Desc: image002.png
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-sessions/attachments/20230104/200d4dd2/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.png
Type: image/png
Size: 1518 bytes
Desc: image003.png
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-sessions/attachments/20230104/200d4dd2/attachment-0007.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image004.png
Type: image/png
Size: 1335 bytes
Desc: image004.png
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-sessions/attachments/20230104/200d4dd2/attachment-0008.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image005.png
Type: image/png
Size: 1000 bytes
Desc: image005.png
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-sessions/attachments/20230104/200d4dd2/attachment-0009.png>


More information about the mpiwg-sessions mailing list