[mpiwg-sessions] Cross-session progress
Rolf Rabenseifner
rabenseifner at hlrs.de
Sun Oct 31 04:00:38 CDT 2021
Dear Dan and Joseph,
I expect that such an info key makes no sense, because
the following example and related statement shows that
the rule is that we always have to require cross-session progress:
_______________
The definition of "local" in MPI and the related progress rules
always allows that a local MPI routine or an MPI call that must
behave as local is still allowed to not return until in another
process an unspecific, i.e., semantically not related MPI
call happens (which is always guaranteed because latest a MPI
finalizing call must be invoked an this one is allowed to
block until all necessary progress will have happened).
Let comm_A and comm_B be two communicators derived from
two different sessions or one of them being part of the
world model.
They may be used in two different software layers which are
independently programmed.
The following program would cause a deadlock if the
return of an MPI_RECV of a MPI_BSEND may not return until
such an unspecific MPI call happens in the process that
called MPI_BSEND and we would require that this unspecific
MPI call is done in the same session as MPI_BSEND.
Process 0:
MPI_Bsend(…, dest=1, comm_A); // Call 0-A
MPI_Recv(…, source=1, comm_B); // Call 0-B
Process 1:
MPI_Recv(…, source=0, comm_A); // Call 1-A
MPI_Send(…, dest=0, comm_B); // Call 1-B
As long Call 1-A as does not return, Call 1-B is not executed
and therefore Call 0-B cannot return and therefore Process 0
cannot issue any further MPI call. This implies that the
Call 0-B must be that one that is the semantically not related MPI
in Process 0 that provides the progress for Call 1-A.
This very simple example shows that cross-session progress is needed.
___________________________
For all readers of this text who are not familiar with the
behavior of MPI_Bsend + MPI_Recv and the progress rule of MPI,
I recommend to look at Slide 589 in my MPI course.
You may also download the zip or tar file and test
MPI/tasks/C/Ch18/progress-test-bsend.c
by using
- a single threaded MPI library (i.e., with providing progress only
inside of MPI routines)
- and an MPI library that provides asynchronous progress.
For the slides and examples (in C, Fortran and Python) please look at
https://www.hlrs.de/training/par-prog-ws/MPI-course-material
Kind regards
Rolf
----- Original Message -----
> From: "mpiwg-sessions" <mpiwg-sessions at lists.mpi-forum.org>
> To: "mpiwg-sessions" <mpiwg-sessions at lists.mpi-forum.org>
> Cc: "Joseph Schuchart" <schuchart at icl.utk.edu>
> Sent: Wednesday, October 27, 2021 6:37:27 PM
> Subject: Re: [mpiwg-sessions] Cross-session progress
> Dan,
>
> I guess this info key would apply to a Session? I can imagine an
> assertion saying that you'll never block on communication, i.e., no
> blocking send/recv and no wait, unless you are sure it completes, to
> make sure you're not creating a blocking dependency. That is the scope
> you have control over over.
>
> This would allow an implementation to take this particular session out
> of the global progress scope (progress on the WPM or other sessions). A
> wait test with requests from that session would still require global
> progress though to resolve any dependencies from sessions that do not
> carry this assert or from the WPM. If all sessions carry this assert
> then of course it's only WPM communication that has to be progressed (if
> any). Would that be useful?
>
> Thanks
> Joseph
>
> On 10/27/21 12:16 PM, Dan Holmes via mpiwg-sessions wrote:
>> Hi all,
>>
>> During the HACC WG call today, we discussed whether progress can be
>> isolated by session. We devised this simple pseudo-code example
>> (below) that shows the answer is “no”. With current progress rules in
>> MPI-4.0 (unchanged from previous versions of MPI), the code must not
>> deadlock at the place(s) indicated by the comments, even with one
>> thread of execution, because the MPI_Recv procedure at process 0 must
>> progress the send operation from process 0, which means the MPI_Recv
>> procedure at process 1 is required to complete.
>>
>> If MPI is permitted to limit the scope of progress during the MPI_Recv
>> procedure to just the operations within a particular session, then it
>> is permitted to refuse to progress the send operation from process 0
>> and deadlock inevitably ensues, unless the two libraries use different
>> threads or MPI supports strong progress (both of which are optional).
>>
>> We suggested an INFO assertion that would give the user the
>> opportunity to assert that they would not code the application in a
>> way that resulted in this kind of deadlock. It might be hard for the
>> user to know for sure when it is safe to use such an INFO assertion,
>> especially in the general case and with opaque/closed-source
>> libraries. However, if the INFO assertion was supplied, MPI could be
>> implemented with separated/isolated progress. The scope of progress is
>> global (whole MPI process) at the moment — and that would have to be
>> the default scope/value for the INFO assertion. Smaller scopes could
>> be session, communicator/window/file, and even operation.
>>
>> Process 0:
>>
>> library_A.begin_call -> {MPI_Issend(…, comm_A); }
>>
>> library_B.begin_call -> {MPI_Recv(…, comm_B); } // deadlock ?
>>
>> library_A.end_call -> {MPI_Wait(…, comm_A); }
>>
>> library_B.end_call -> { }
>>
>> Process 1:
>>
>> library_A.begin_call -> {MPI_Recv(…, comm_A); } // deadlock ?
>>
>> library_B.begin_call -> {MPI_Issend(…, comm_B); }
>>
>> library_A.end_call -> { }
>>
>> library_B.end_call -> {MPI_Wait(…, comm_B); }
>>
>>
>> Cheers,
>> Dan.
>> —
>> Dr Daniel Holmes PhD
>> Executive Director
>> Chief Technology Officer
>> CHI Ltd
>> danholmes at chi.scot <mailto:danholmes at chi.scot>
>>
>>
>>
>>
>> _______________________________________________
>> mpiwg-sessions mailing list
>> mpiwg-sessions at lists.mpi-forum.org
>> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions
>
> _______________________________________________
> mpiwg-sessions mailing list
> mpiwg-sessions at lists.mpi-forum.org
> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions
--
Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de .
High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530 .
University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832 .
Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner .
Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307) .
More information about the mpiwg-sessions
mailing list