[mpiwg-sessions] MPI_Intercomm_from_group folow-up

Thu Jan 10 05:11:41 CST 2019

Hi Tony,

I think the result of your weaker semantics is PVM or some other type of formalised chaos :)

Handling group-members failing to take part could be useful for the process-fail-stop type of fault tolerance but how does a partial group reach consensus that the other members are never going to join in, rather than just assuming they are being a bit slow? There are ways - but let us first define what keep-alives/timeouts/RAS looks like in MPI and then look at the implications of that on the whole interface. The nearest MPI semantic is soft spawn, I think. We could look at what might be permitted if the user supplied an info key “soft = true” to any communicator/window/file creation routine. This is, perhaps, a topic for the FT WG.

How many non-members are permitted to join in ad-hoc with no warning? When is the communicator creation finished, if ever? Can new members apply to join even after some (sub)group has reached a consensus and begun using the communicator for communication? The nearest MPI semantic is connect/accept, I think, but that permits exactly one connection between (the root processes of) two arbitrarily sized pre-existing groups where all members of each group have already reached consensus with all other members of their group. By induction, that relies on pairwise connections - fixed size of 2 (i.e. exactly one non-member), FCFS, block indefinitely if unmatched.

There is an event/exception mechanism in the PMIx functionality that we are currently using for the prototype implementation - if PMIx discovers actual process-failure then it can callback into all other processes to tell them about it and they can all react in one of several appropriate ways (fail the operation, form a smaller group, invite spare processes to act as replacements, request more resources to act as replacements). Some of those reactions might lead to resilience rather than fault tolerance but it is all hidden behind (or forbidden by) the stronger semantics of MPI, which assumes reliability and a priori coordination (information supplied at X is always consistent with information supplied at Y). Again, this is probably a topic for the FT WG.

A question that has always bothered me about intercomms - why only two groups? Why not permit a topology of leaders, each of which coordinates a local group, possibly with its own local topology? Does anyone have a compelling use-case for such generality?

Summary: these are interesting research questions but not Sessions WG topics.

Cheers,
Dan.
—
Dr Daniel Holmes PhD
Applications Consultant in HPC Research
d.holmes at epcc.ed.ac.uk<mailto:d.holmes at epcc.ed.ac.uk>
Phone: +44 (0) 131 651 3465
Mobile: +44 (0) 7940 524 088
Address: Room 2.09, Bayes Centre, 47 Potterrow, Central Area, Edinburgh, EH8 9BT
—
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
—

On 9 Jan 2019, at 18:14, Pritchard Jr., Howard via mpiwg-sessions <mpiwg-sessions at lists.mpi-forum.org<mailto:mpiwg-sessions at lists.mpi-forum.org>> wrote:

--
Howard Pritchard
B Schedule
HPC-ENV
Office 9, 2nd floor Research Park
TA-03, Building 4200, Room 203
Los Alamos National Laboratory

From: "Tony, Skelljum" <Tony-Skjellum at utc.edu<mailto:Tony-Skjellum at utc.edu>>
Date: Wednesday, January 9, 2019 at 10:47 AM
To: Howard Pritchard <howardp at lanl.gov<mailto:howardp at lanl.gov>>
Cc: MPI Sessions working group <mpiwg-sessions at lists.mpi-forum.org<mailto:mpiwg-sessions at lists.mpi-forum.org>>
Subject: Re: MPI_Intercomm_from_group folow-up

Howard, OK

1) What happens when non-group members call the intercomm function
2) What happens when not-all group members call the intercomm function
--> Dan convinced me that this is disallowed by the semantics as written
--> I want to explore weakening those semantics to see where that leads.

My further thought is we should explore

  *   Arms-length versions of these functions where it is intentionally allowed for non-members to call functions and not all members to call functions--for FT reasons, maybe in cases where we just need pt2pt communication
  *   When we introduce topology to groups vs. comms, intercomms with topology of groups on each side also pose an interesting new graph-to-graph connectivity idea that we might exploit for scalability and maybe for FT

Thanks,
Tony

Anthony Skjellum, PhD
Professor of Computer Science and Chair of Excellence
Director, SimCenter
University of Tennessee at Chattanooga (UTC)
tony-skjellum at utc.edu<mailto:tony-skjellum at utc.edu>  [or skjellum at gmail.com<mailto:skjellum at gmail.com>]
cell: 205-807-4968

________________________________
From: Pritchard Jr., Howard <howardp at lanl.gov<mailto:howardp at lanl.gov>>
Sent: Wednesday, January 9, 2019 12:21:03 PM
To: Skjellum, Anthony
Cc: MPI Sessions working group
Subject: MPI_Intercomm_from_group folow-up

HI Tony,

The audio was really bad a few minutes ago and I couldn’t understand what you were saying.
Could you summarize on the WG list what you were suggesting?

Thanks,

Howard

--
Howard Pritchard
B Schedule
HPC-ENV
Office 9, 2nd floor Research Park
TA-03, Building 4200, Room 203
Los Alamos National Laboratory

_______________________________________________
mpiwg-sessions mailing list
mpiwg-sessions at lists.mpi-forum.org<mailto:mpiwg-sessions at lists.mpi-forum.org>
https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-sessions/attachments/20190110/3de186e6/attachment-0001.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-sessions/attachments/20190110/3de186e6/attachment-0001.ksh>