[mpiwg-sessions] Cross-session progress
Rolf Rabenseifner
rabenseifner at hlrs.de
Sun Oct 31 05:47:00 CDT 2021
Dear Martin,
> However, I do wonder if this is the right thing to do - cross-session progress
> does imply that there is a quite close connection and sharing of resources
> between sessions, which is exactly what we wanted to avoid.
Please read my example very carfully.
> Process 0:
>
> MPI_Bsend(…, dest=1, comm_A); // Call 0-A
>
> MPI_Recv(…, source=1, comm_B); // Call 0-B
>
> Process 1:
>
> MPI_Recv(…, source=0, comm_A); // Call 1-A
>
> MPI_Send(…, dest=0, comm_B); // Call 1-B
The communication in Session A / comm_A and the communication in Session B / comm_B
is absolutely without any connection to each other, i.e., this example
is definitely the contrary of "a quite close connection".
The problem arises of that all local routines or locally acting calls,
like the MPI_Recv in 1-A after the MPI_Bsend in 0-A is called
(which my be some time before the MPI_Recv is called),
can be implemented as "waek local" a term that is not defined in MPI,
but should mean that it may not return until a specific other process
(here Process 0) calls an unspecific (i.e., not semantically related)
MPI procedure.
If an MPI library does not use this feature, i.e., is doing progress
with an asynchronous thread for all sessions (a) or several such threads
for each session (b), then the problem does not exist.
Then also no such info key is needed and in case of (b), a very clear
separation of sessions is given (whether several such threads is efficient,
is another question, but not of the MPI standard, only about the
quality of an MPI library implementation).
In my opinion, there is no reason for changeing the progress rule,
which would make MPI more complicated for the user, because there
are perfect options for the implementors and all are (and should be)
invisible for the question whether a given MPI application is correct.
Kind regards
Rolf
----- Original Message -----
> From: "Martin Schulz" <schulzm at in.tum.de>
> To: "Rolf Rabenseifner" <rabenseifner at hlrs.de>, "mpiwg-sessions" <mpiwg-sessions at lists.mpi-forum.org>
> Sent: Sunday, October 31, 2021 10:49:18 AM
> Subject: Re: [mpiwg-sessions] Cross-session progress
> Hi Rolf, all,
>
> I agree that under the current rules, cross-session progress is required and
> there is probably very little room to maneuver.
>
> However, I do wonder if this is the right thing to do - cross-session progress
> does imply that there is a quite close connection and sharing of resources
> between sessions, which is exactly what we wanted to avoid. Also, one could
> claim that if you have a program that has such a close connection that blocking
> in one session can harm progress in another, then you have written a bad
> program, as clearly the two communication operations are logically connected
> and hence should not be in two sessions.
>
> We also had the idea once that it may be useful that different MPI
> implementations back different sessions, which would then mean that they cannot
> be connected and also would not be able to know about the progress in the other
> session.
>
> This would, of course, require new text that significantly changes progress
> rules in MPI (not in the WPM, but in the Sessions Model) with a whole bunch of
> consequences, but it would be matching the original idea of Sessions as
> independent access points into the MPI library.
>
> Martin
>
>
> --
> Prof. Dr. Martin Schulz, Chair of Computer Architecture and Parallel Systems
> Department of Informatics, TU-Munich, Boltzmannstraße 3, D-85748 Garching
> Member of the Board of Directors at the Leibniz Supercomputing Centre (LRZ)
> Email: schulzm at in.tum.de
>
>
>
>On 31.10.21, 10:00, "mpiwg-sessions on behalf of Rolf Rabenseifner via
>mpiwg-sessions" <mpiwg-sessions-bounces at lists.mpi-forum.org on behalf of
>mpiwg-sessions at lists.mpi-forum.org> wrote:
>
> Dear Dan and Joseph,
>
> I expect that such an info key makes no sense, because
> the following example and related statement shows that
> the rule is that we always have to require cross-session progress:
>
> _______________
> The definition of "local" in MPI and the related progress rules
> always allows that a local MPI routine or an MPI call that must
> behave as local is still allowed to not return until in another
> process an unspecific, i.e., semantically not related MPI
> call happens (which is always guaranteed because latest a MPI
> finalizing call must be invoked an this one is allowed to
> block until all necessary progress will have happened).
>
> Let comm_A and comm_B be two communicators derived from
> two different sessions or one of them being part of the
> world model.
> They may be used in two different software layers which are
> independently programmed.
> The following program would cause a deadlock if the
> return of an MPI_RECV of a MPI_BSEND may not return until
> such an unspecific MPI call happens in the process that
> called MPI_BSEND and we would require that this unspecific
> MPI call is done in the same session as MPI_BSEND.
>
> Process 0:
>
> MPI_Bsend(…, dest=1, comm_A); // Call 0-A
>
> MPI_Recv(…, source=1, comm_B); // Call 0-B
>
> Process 1:
>
> MPI_Recv(…, source=0, comm_A); // Call 1-A
>
> MPI_Send(…, dest=0, comm_B); // Call 1-B
>
> As long Call 1-A as does not return, Call 1-B is not executed
> and therefore Call 0-B cannot return and therefore Process 0
> cannot issue any further MPI call. This implies that the
> Call 0-B must be that one that is the semantically not related MPI
> in Process 0 that provides the progress for Call 1-A.
> This very simple example shows that cross-session progress is needed.
> ___________________________
>
> For all readers of this text who are not familiar with the
> behavior of MPI_Bsend + MPI_Recv and the progress rule of MPI,
> I recommend to look at Slide 589 in my MPI course.
> You may also download the zip or tar file and test
> MPI/tasks/C/Ch18/progress-test-bsend.c
> by using
> - a single threaded MPI library (i.e., with providing progress only
> inside of MPI routines)
> - and an MPI library that provides asynchronous progress.
>
> For the slides and examples (in C, Fortran and Python) please look at
>
> https://www.hlrs.de/training/par-prog-ws/MPI-course-material
>
> Kind regards
> Rolf
>
> ----- Original Message -----
> > From: "mpiwg-sessions" <mpiwg-sessions at lists.mpi-forum.org>
> > To: "mpiwg-sessions" <mpiwg-sessions at lists.mpi-forum.org>
> > Cc: "Joseph Schuchart" <schuchart at icl.utk.edu>
> > Sent: Wednesday, October 27, 2021 6:37:27 PM
> > Subject: Re: [mpiwg-sessions] Cross-session progress
>
> > Dan,
> >
> > I guess this info key would apply to a Session? I can imagine an
> > assertion saying that you'll never block on communication, i.e., no
> > blocking send/recv and no wait, unless you are sure it completes, to
> > make sure you're not creating a blocking dependency. That is the scope
> > you have control over over.
> >
> > This would allow an implementation to take this particular session out
> > of the global progress scope (progress on the WPM or other sessions). A
> > wait test with requests from that session would still require global
> > progress though to resolve any dependencies from sessions that do not
> > carry this assert or from the WPM. If all sessions carry this assert
> > then of course it's only WPM communication that has to be progressed (if
> > any). Would that be useful?
> >
> > Thanks
> > Joseph
> >
> > On 10/27/21 12:16 PM, Dan Holmes via mpiwg-sessions wrote:
> >> Hi all,
> >>
> >> During the HACC WG call today, we discussed whether progress can be
> >> isolated by session. We devised this simple pseudo-code example
> >> (below) that shows the answer is “no”. With current progress rules in
> >> MPI-4.0 (unchanged from previous versions of MPI), the code must not
> >> deadlock at the place(s) indicated by the comments, even with one
> >> thread of execution, because the MPI_Recv procedure at process 0 must
> >> progress the send operation from process 0, which means the MPI_Recv
> >> procedure at process 1 is required to complete.
> >>
> >> If MPI is permitted to limit the scope of progress during the MPI_Recv
> >> procedure to just the operations within a particular session, then it
> >> is permitted to refuse to progress the send operation from process 0
> >> and deadlock inevitably ensues, unless the two libraries use different
> >> threads or MPI supports strong progress (both of which are optional).
> >>
> >> We suggested an INFO assertion that would give the user the
> >> opportunity to assert that they would not code the application in a
> >> way that resulted in this kind of deadlock. It might be hard for the
> >> user to know for sure when it is safe to use such an INFO assertion,
> >> especially in the general case and with opaque/closed-source
> >> libraries. However, if the INFO assertion was supplied, MPI could be
> >> implemented with separated/isolated progress. The scope of progress is
> >> global (whole MPI process) at the moment — and that would have to be
> >> the default scope/value for the INFO assertion. Smaller scopes could
> >> be session, communicator/window/file, and even operation.
> >>
> >> Process 0:
> >>
> >> library_A.begin_call -> {MPI_Issend(…, comm_A); }
> >>
> >> library_B.begin_call -> {MPI_Recv(…, comm_B); } // deadlock ?
> >>
> >> library_A.end_call -> {MPI_Wait(…, comm_A); }
> >>
> >> library_B.end_call -> { }
> >>
> >> Process 1:
> >>
> >> library_A.begin_call -> {MPI_Recv(…, comm_A); } // deadlock ?
> >>
> >> library_B.begin_call -> {MPI_Issend(…, comm_B); }
> >>
> >> library_A.end_call -> { }
> >>
> >> library_B.end_call -> {MPI_Wait(…, comm_B); }
> >>
> >>
> >> Cheers,
> >> Dan.
> >> —
> >> Dr Daniel Holmes PhD
> >> Executive Director
> >> Chief Technology Officer
> >> CHI Ltd
> >> danholmes at chi.scot <mailto:danholmes at chi.scot>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> mpiwg-sessions mailing list
> >> mpiwg-sessions at lists.mpi-forum.org
> >> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions
> >
> > _______________________________________________
> > mpiwg-sessions mailing list
> > mpiwg-sessions at lists.mpi-forum.org
> > https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions
>
> --
> Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de .
> High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530 .
> University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832 .
> Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner .
> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307) .
> _______________________________________________
> mpiwg-sessions mailing list
> mpiwg-sessions at lists.mpi-forum.org
> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions
--
Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de .
High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530 .
University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832 .
Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner .
Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307) .
More information about the mpiwg-sessions
mailing list