[mpiwg-sessions] Sessions WG - meet 1/29/24
Holmes, Daniel John
daniel.john.holmes at intel.com
Mon Jan 29 10:12:36 CST 2024
Hi Howard/all,
Here is the simple code I was talking about in the meeting today:
// general high-level optimistic application
void main() {
MPI_Session session;
MPI_Session_Init(MPI_INFO_NULL, MPI_ERRORS_ARE_FATAL, &session);
MPI_Group group;
MPI_Group_from_session_pset(session, "mpi://world", &group);
MPI_Comm comm;
MPI_Comm_create_from_group(group, &comm);
ret = do_stuff_with_comm(comm);
if (MPI_SUCCESS == ret) {
MPI_Comm_disconnect(&comm);
MPI_Session_Finalize(&session);
break;
} else {
panic();
}
}
// general high-level pragmatic application
void main() {
// additional code
while (1) {
MPI_Session session;
MPI_Session_Init(MPI_INFO_NULL, MPI_ERRORS_RETURN, &session);
MPI_Group world, failed, group;
MPI_Group_from_session_pset(session, "mpi://world", &world);
// additional code
MPI_Session_get_proc_failed(session, &failed); // new API, seems easy to do
MPI_Group_difference(world, failed, &group);
MPI_Group_free(&world);
MPI_Group_free(&failed);
MPI_Comm comm;
MPI_Comm_create_from_group(group, &comm); // <-- the detail-devils live here
MPI_Group_free(&group);
ret = do_stuff_with_comm(comm);
MPI_Comm_disconnect(&comm);
MPI_Session_Finalize(&session);
if (MPI_SUCCESS == ret) {
break; // all done!
} else if (MPI_ERR_PROC_FAILED == ret) {
continue; // no more panic
}
// additional code
} // end while
}
Best wishes,
Dan.
From: mpiwg-sessions <mpiwg-sessions-bounces at lists.mpi-forum.org> On Behalf Of Pritchard Jr., Howard via mpiwg-sessions
Sent: Thursday, January 25, 2024 6:07 PM
To: MPI Sessions working group <mpiwg-sessions at lists.mpi-forum.org>
Cc: Pritchard Jr., Howard <howardp at lanl.gov>
Subject: [mpiwg-sessions] Sessions WG - meet 1/29/24
Hi Folks,
Let's meet on 1/29 to continue discussions related to sessions and FT.
I think what will help is to consider several use cases and implications.
Here are some I have
* App using sessions to init/finalize and create at least one initial communicator with MPI_Comm_create_from_group, but also wants to use methods available in slice1 of ULFM proposal to shrink/repair communicators. Are there any problems?
* App using sessions to init/finalize and create at least one initial communicator with MPI_Comm_create_from_group, and wants to use methods available in slice 1 of ULFM proposal to create new group from a pset and create a new communicator
* App using sessions to init/finalize, etc. and when a fail-stop error is detected destroy the session, create a new session query for process sets, etc. and start all over.
We should also consider the behavior of MPI_Comm_create_from_group if a process failure occurs while creating a new communicator. The ULFM slice 1 discusses behavior of MPI_COMM_DUP and process failure. We'd probably want similar behavior for MPI_Comm_create_from_group.
For those with access, the ULFM slice 1 PR is at https://github.com/mpi-forum/mpi-standard/pull/947
Thanks,
Howard
-------
[signature_61897647]
Howard Pritchard
Research Scientist
HPC-ENV
Los Alamos National Laboratory
howardp at lanl.gov<mailto:howardp at lanl.gov>
[signature_1672648044]<https://www.instagram.com/losalamosnatlab/>[signature_2067890307]<https://twitter.com/LosAlamosNatLab>[signature_1942525183]<https://www.linkedin.com/company/los-alamos-national-laboratory/>[signature_882949974]<https://www.facebook.com/LosAlamosNationalLab/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-sessions/attachments/20240129/146a3f02/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 4350 bytes
Desc: image001.png
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-sessions/attachments/20240129/146a3f02/attachment-0005.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 1981 bytes
Desc: image002.png
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-sessions/attachments/20240129/146a3f02/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.png
Type: image/png
Size: 1517 bytes
Desc: image003.png
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-sessions/attachments/20240129/146a3f02/attachment-0007.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image004.png
Type: image/png
Size: 1334 bytes
Desc: image004.png
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-sessions/attachments/20240129/146a3f02/attachment-0008.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image005.png
Type: image/png
Size: 999 bytes
Desc: image005.png
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-sessions/attachments/20240129/146a3f02/attachment-0009.png>
More information about the mpiwg-sessions
mailing list