[mpiwg-sessions] Cross-session progress

Martin Schulz schulzm at in.tum.de
Sun Oct 31 04:49:18 CDT 2021


Hi Rolf, all,

I agree that under the current rules, cross-session progress is required and there is probably very little room to maneuver.

However, I do wonder if this is the right thing to do - cross-session progress does imply that there is a quite close connection and sharing of resources between sessions, which is exactly what we wanted to avoid. Also, one could claim that if you have a program that has such a close connection that blocking in one session can harm progress in another, then you have written a bad program, as clearly the two communication operations are logically connected and hence should not be in two sessions.

We also had the idea once that it may be useful that different MPI implementations back different sessions, which would then mean that they cannot be connected and also would not be able to know about the progress in the other session.

This would, of course, require new text that significantly changes progress rules in MPI (not in the WPM, but in the Sessions Model) with a whole bunch of consequences, but it would be matching the original idea of Sessions as independent access points into the MPI library.

Martin


-- 
Prof. Dr. Martin Schulz, Chair of Computer Architecture and Parallel Systems
Department of Informatics, TU-Munich, Boltzmannstraße 3, D-85748 Garching
Member of the Board of Directors at the Leibniz Supercomputing Centre (LRZ)
Email: schulzm at in.tum.de
 
 

On 31.10.21, 10:00, "mpiwg-sessions on behalf of Rolf Rabenseifner via mpiwg-sessions" <mpiwg-sessions-bounces at lists.mpi-forum.org on behalf of mpiwg-sessions at lists.mpi-forum.org> wrote:

    Dear Dan and Joseph,

    I expect that such an info key makes no sense, because 
    the following example and related statement shows that
    the rule is that we always have to require cross-session progress:

    _______________
    The definition of "local" in MPI and the related progress rules
    always allows that a local MPI routine or an MPI call that must
    behave as local is still allowed to not return until in another
    process an unspecific, i.e., semantically not related MPI
    call happens (which is always guaranteed because latest a MPI
    finalizing call must be invoked an this one is allowed to
    block until all necessary progress will have happened).

    Let comm_A and comm_B be two communicators derived from 
    two different sessions or one of them being part of the 
    world model.
    They may be used in two different software layers which are
    independently programmed.
    The following program would cause a deadlock if the 
    return of an MPI_RECV of a MPI_BSEND may not return until
    such an unspecific MPI call happens in the process that 
    called MPI_BSEND and we would require that this unspecific 
    MPI call is done in the same session as MPI_BSEND.

    Process 0:

      MPI_Bsend(…, dest=1, comm_A); // Call 0-A

      MPI_Recv(…, source=1, comm_B); // Call 0-B

    Process 1:

      MPI_Recv(…, source=0, comm_A); // Call 1-A 

      MPI_Send(…, dest=0, comm_B); // Call 1-B

    As long Call 1-A as does not return, Call 1-B is not executed
    and therefore Call 0-B cannot return and therefore Process 0
    cannot issue any further MPI call. This implies that the
    Call 0-B must be that one that is the semantically not related MPI
    in Process 0 that provides the progress for Call 1-A.
    This very simple example shows that cross-session progress is needed.
    ___________________________

    For all readers of this text who are not familiar with the 
    behavior of MPI_Bsend + MPI_Recv and the progress rule of MPI,
    I recommend to look  at Slide 589 in my MPI course.
    You may also download the zip or tar file and test
      MPI/tasks/C/Ch18/progress-test-bsend.c
    by using 
    - a single threaded MPI library (i.e., with providing progress only
      inside of MPI routines)
    - and an MPI library that provides asynchronous progress. 

    For the slides and examples (in C, Fortran and Python) please look at

      https://www.hlrs.de/training/par-prog-ws/MPI-course-material 

    Kind regards
    Rolf

    ----- Original Message -----
    > From: "mpiwg-sessions" <mpiwg-sessions at lists.mpi-forum.org>
    > To: "mpiwg-sessions" <mpiwg-sessions at lists.mpi-forum.org>
    > Cc: "Joseph Schuchart" <schuchart at icl.utk.edu>
    > Sent: Wednesday, October 27, 2021 6:37:27 PM
    > Subject: Re: [mpiwg-sessions] Cross-session progress

    > Dan,
    > 
    > I guess this info key would apply to a Session? I can imagine an
    > assertion saying that you'll never block on communication, i.e., no
    > blocking send/recv and no wait, unless you are sure it completes, to
    > make sure you're not creating a blocking dependency. That is the scope
    > you have control over over.
    > 
    > This would allow an implementation to take this particular session out
    > of the global progress scope (progress on the WPM or other sessions). A
    > wait test with requests from that session would still require global
    > progress though to resolve any dependencies from sessions that do not
    > carry this assert or from the WPM. If all sessions carry this assert
    > then of course it's only WPM communication that has to be progressed (if
    > any). Would that be useful?
    > 
    > Thanks
    > Joseph
    > 
    > On 10/27/21 12:16 PM, Dan Holmes via mpiwg-sessions wrote:
    >> Hi all,
    >>
    >> During the HACC WG call today, we discussed whether progress can be
    >> isolated by session. We devised this simple pseudo-code example
    >> (below) that shows the answer is “no”. With current progress rules in
    >> MPI-4.0 (unchanged from previous versions of MPI), the code must not
    >> deadlock at the place(s) indicated by the comments, even with one
    >> thread of execution, because the MPI_Recv procedure at process 0 must
    >> progress the send operation from process 0, which means the MPI_Recv
    >> procedure at process 1 is required to complete.
    >>
    >> If MPI is permitted to limit the scope of progress during the MPI_Recv
    >> procedure to just the operations within a particular session, then it
    >> is permitted to refuse to progress the send operation from process 0
    >> and deadlock inevitably ensues, unless the two libraries use different
    >> threads or MPI supports strong progress (both of which are optional).
    >>
    >> We suggested an INFO assertion that would give the user the
    >> opportunity to assert that they would not code the application in a
    >> way that resulted in this kind of deadlock. It might be hard for the
    >> user to know for sure when it is safe to use such an INFO assertion,
    >> especially in the general case and with opaque/closed-source
    >> libraries. However, if the INFO assertion was supplied, MPI could be
    >> implemented with separated/isolated progress. The scope of progress is
    >> global (whole MPI process) at the moment — and that would have to be
    >> the default scope/value for the INFO assertion. Smaller scopes could
    >> be session, communicator/window/file, and even operation.
    >>
    >> Process 0:
    >>
    >> library_A.begin_call -> {MPI_Issend(…, comm_A); }
    >>
    >> library_B.begin_call -> {MPI_Recv(…, comm_B); } // deadlock ?
    >>
    >> library_A.end_call -> {MPI_Wait(…, comm_A); }
    >>
    >> library_B.end_call -> { }
    >>
    >> Process 1:
    >>
    >> library_A.begin_call -> {MPI_Recv(…, comm_A); } // deadlock ?
    >>
    >> library_B.begin_call -> {MPI_Issend(…, comm_B); }
    >>
    >> library_A.end_call -> { }
    >>
    >> library_B.end_call -> {MPI_Wait(…, comm_B); }
    >>
    >>
    >> Cheers,
    >> Dan.
    >> —
    >> Dr Daniel Holmes PhD
    >> Executive Director
    >> Chief Technology Officer
    >> CHI Ltd
    >> danholmes at chi.scot <mailto:danholmes at chi.scot>
    >>
    >>
    >>
    >>
    >> _______________________________________________
    >> mpiwg-sessions mailing list
    >> mpiwg-sessions at lists.mpi-forum.org
    >> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions
    > 
    > _______________________________________________
    > mpiwg-sessions mailing list
    > mpiwg-sessions at lists.mpi-forum.org
    > https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions

    -- 
    Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de .
    High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530 .
    University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832 .
    Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner .
    Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307) .
    _______________________________________________
    mpiwg-sessions mailing list
    mpiwg-sessions at lists.mpi-forum.org
    https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions




More information about the mpiwg-sessions mailing list