[mpiwg-sessions] Sessions WG - meet 1/29/24

Holmes, Daniel John daniel.john.holmes at intel.com
Mon Jan 29 10:12:36 CST 2024


Hi Howard/all,

Here is the simple code I was talking about in the meeting today:

// general high-level optimistic application
void main() {

     MPI_Session session;
     MPI_Session_Init(MPI_INFO_NULL, MPI_ERRORS_ARE_FATAL, &session);
     MPI_Group group;
     MPI_Group_from_session_pset(session, "mpi://world", &group);
     MPI_Comm comm;
     MPI_Comm_create_from_group(group, &comm);

     ret = do_stuff_with_comm(comm);

     if (MPI_SUCCESS == ret) {
           MPI_Comm_disconnect(&comm);
           MPI_Session_Finalize(&session);
           break;

     } else {
           panic();

     }
}

// general high-level pragmatic application
void main() {

     // additional code
     while (1) {

     MPI_Session session;
     MPI_Session_Init(MPI_INFO_NULL, MPI_ERRORS_RETURN, &session);
     MPI_Group world, failed, group;
     MPI_Group_from_session_pset(session, "mpi://world", &world);

     // additional code
     MPI_Session_get_proc_failed(session, &failed); // new API, seems easy to do
     MPI_Group_difference(world, failed, &group);
     MPI_Group_free(&world);
     MPI_Group_free(&failed);

     MPI_Comm comm;
     MPI_Comm_create_from_group(group, &comm); // <-- the detail-devils live here
     MPI_Group_free(&group);

     ret = do_stuff_with_comm(comm);

     MPI_Comm_disconnect(&comm);
     MPI_Session_Finalize(&session);

     if (MPI_SUCCESS == ret) {
           break; // all done!

     } else if (MPI_ERR_PROC_FAILED == ret) {
           continue; // no more panic

     }

     // additional code
     } // end while
}


Best wishes,
Dan.


From: mpiwg-sessions <mpiwg-sessions-bounces at lists.mpi-forum.org> On Behalf Of Pritchard Jr., Howard via mpiwg-sessions
Sent: Thursday, January 25, 2024 6:07 PM
To: MPI Sessions working group <mpiwg-sessions at lists.mpi-forum.org>
Cc: Pritchard Jr., Howard <howardp at lanl.gov>
Subject: [mpiwg-sessions] Sessions WG - meet 1/29/24

Hi Folks,

Let's meet on 1/29 to continue discussions related to sessions and FT.

I think what will help is to consider several use cases and implications.

Here are some I have

  *   App using sessions to init/finalize and create at least one initial communicator with MPI_Comm_create_from_group,  but also wants to use methods available in slice1 of ULFM proposal to shrink/repair communicators.  Are there any problems?
  *   App using sessions to init/finalize and create at least one initial communicator with MPI_Comm_create_from_group, and wants to use methods available in slice 1 of ULFM proposal to create new group from a pset and create a new communicator
  *   App using sessions to init/finalize, etc. and when a fail-stop error is detected destroy the session, create a new session query for process sets, etc. and start all over.

We should also consider the behavior of MPI_Comm_create_from_group if a process failure occurs while creating a new communicator.  The ULFM slice 1 discusses behavior of MPI_COMM_DUP and process failure.  We'd probably want similar behavior for MPI_Comm_create_from_group.

For those with access, the ULFM slice 1 PR is at https://github.com/mpi-forum/mpi-standard/pull/947

Thanks,

Howard

-------

[signature_61897647]
Howard Pritchard
Research Scientist
HPC-ENV

Los Alamos National Laboratory
howardp at lanl.gov<mailto:howardp at lanl.gov>

[signature_1672648044]<https://www.instagram.com/losalamosnatlab/>[signature_2067890307]<https://twitter.com/LosAlamosNatLab>[signature_1942525183]<https://www.linkedin.com/company/los-alamos-national-laboratory/>[signature_882949974]<https://www.facebook.com/LosAlamosNationalLab/>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-sessions/attachments/20240129/146a3f02/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 4350 bytes
Desc: image001.png
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-sessions/attachments/20240129/146a3f02/attachment-0005.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 1981 bytes
Desc: image002.png
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-sessions/attachments/20240129/146a3f02/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.png
Type: image/png
Size: 1517 bytes
Desc: image003.png
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-sessions/attachments/20240129/146a3f02/attachment-0007.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image004.png
Type: image/png
Size: 1334 bytes
Desc: image004.png
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-sessions/attachments/20240129/146a3f02/attachment-0008.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image005.png
Type: image/png
Size: 999 bytes
Desc: image005.png
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-sessions/attachments/20240129/146a3f02/attachment-0009.png>


More information about the mpiwg-sessions mailing list