[mpiwg-sessions] [EXTERNAL] Re: more excitement - more nuanced response to issue 435

Rolf Rabenseifner rabenseifner at hlrs.de
Mon Feb 22 03:32:54 CST 2021


Dear all,

>> https://github.com/mpiwg-sessions/mpi-standard/pull/48
> 
> is not open to my github account RolfRabenseifner .

Can some one fix this problem, or at least send me the pdf 
that I can look at the proposed solution.

>  I'd like to agreement on wording before
>> adding in one or more examples.

Your examples were great. You should definitely add them.

> ... agreement ...
When is the meeting and how to participate?

Best regards
Rolf


----- Original Message -----
> From: "Rolf Rabenseifner" <rabenseifner at hlrs.de>
> To: "Pritchard" <howardp at lanl.gov>, "Wesley Bland" <wesley.bland at intel.com>
> Sent: Sunday, February 21, 2021 9:47:35 AM
> Subject: Re: [mpiwg-sessions] [EXTERNAL] Re: more excitement - more nuanced response to issue 435

> Dear Howard and Wesley,
> 
>> https://github.com/mpiwg-sessions/mpi-standard/pull/48
> 
> is not open to my github account RolfRabenseifner .
> 
> Can one of you both fix this?
> 
> Best regards
> Rolf
> 
> ----- Original Message -----
>> From: "mpiwg-sessions" <mpiwg-sessions at lists.mpi-forum.org>
>> To: "mpiwg-sessions" <mpiwg-sessions at lists.mpi-forum.org>
>> Cc: "Pritchard" <howardp at lanl.gov>
>> Sent: Saturday, February 20, 2021 8:59:59 PM
>> Subject: Re: [mpiwg-sessions] [EXTERNAL] Re: more excitement - more nuanced
>> response to issue 435
> 
>> Hi All,
>> 
>> 
>> https://github.com/mpiwg-sessions/mpi-standard/pull/48
>> 
>> I did not include the new example yet.  I'd like to agreement on wording before
>> adding in one or more examples.
>> 
>> Some of Rolf's wording was unclear so I tried to wordsmith it.
>> 
>> Howard
>> 
>> 
>>On 2/20/21, 6:31 AM, "mpiwg-sessions on behalf of Daniel Holmes via
>>mpiwg-sessions" <mpiwg-sessions-bounces at lists.mpi-forum.org on behalf of
>>mpiwg-sessions at lists.mpi-forum.org> wrote:
>> 
>>    Hi Martin,
>>    
>>    Personally, I think the "may be synchronising" semantic from collective is more
>>    than enough and Rolf's "must be synchronising, like a bunch of barriers" is
>>    over-specifying.
>>    
>>    Also, I liked Rolf's suggestion of "may perform collective operations" on all
>>    communicators, windows, and files derived from the session and not yet freed by
>>    the user.
>>    
>>    Generic collective operations, not over-specifying barrier or all-to-all.
>>    
>>    Operations, not procedures.
>>    
>>    May perform, to permit fully local implementation, if that is possible for some
>>    library. May do something that may be synchronising, double may, implies
>>    synchronising is an edge case.
>>    
>>    Question: is freed the right word? Communicators: no (needs to say
>>    disconnected), windows: yes, files: no (needs to say closed). MPI_COMM_FREE
>>    leaves work still to be done.
>>    
>>    Would benefit from the "outcome as if forks threads, executes one blocking
>>    operation per thread, and joins threads before returning" implementation
>>    sketch. Note this is different and superior to "initiates nonblocking
>>    operations and executes wait-all" because wait-all is equiv to many waits in
>>    arbitrary order.
>>    
>>    Cheers,
>>    Dan.
>>    
>>    20 Feb 2021 13:07:23 Martin Schulz via mpiwg-sessions
>>    <mpiwg-sessions at lists.mpi-forum.org>:
>>    
>>    > Hi all,
>>    > 
>>    > Do we really want MPI_Session_finalize to be guaranteed synchronizing? I fully
>>    > understand that it could be and a user must be aware of that, but the text
>>    > below sounds like as if the user can rely on the synchronizing properties of
>>    > session_finalize.
>>    > 
>>    > Thanks,
>>    > 
>>    > Martin
>>    > 
>>    > 
>>    > --
>>    > Prof. Dr. Martin Schulz, Chair of Computer Architecture and Parallel Systems
>>    > Department of Informatics, TU-Munich, Boltzmannstraße 3, D-85748 Garching
>>    > Member of the Board of Directors at the Leibniz Supercomputing Centre (LRZ)
>>    > Email: schulzm at in.tum.de
>>    > 
>>    > 
>>    > 
>>    > On 20.02.21, 13:17, "mpiwg-sessions on behalf of Rolf Rabenseifner via
>>    > mpiwg-sessions" <mpiwg-sessions-bounces at lists.mpi-forum.org on behalf of
>>    > mpiwg-sessions at lists.mpi-forum.org> wrote:
>>    > 
>>    >     Dear Howard, Dan, Martin and all,
>>    > 
>>    >     My apologies that I wasn't yet on mpiwg-sessions at lists.mpi-forum.org
>>    > 
>>    >     I really like you proposal in
>>    >     http://lists.mpi-forum.org/pipermail/mpiwg-sessions/attachments/20210219/c8e38d93/attachment-0001.pdf
>>    > 
>>    >     Your text includes all of the outstanding problems that I listed
>>    >     in my email below and that you already mentioned in an earlier email in this
>>    >     thread.
>>    > 
>>    > 
>>    >     I would substitute you
>>    > 
>>    >       a series of MPI_IALLTOALL calls
>>    >       over all communicators
>>    >       still associated with the session
>>    > 
>>    >     by
>>    > 
>>    >       a series of nonblocking synchronizing calls (like MPI_IBARRIER,
>>    >       or internal nonblocking versions of MPI_WIN_FENCE and MPI_FILE_SYNC)
>>    >       over all communicators, windows and file handles
>>    >       still associated with the session
>>    > 
>>    >     A probably better alternative would be
>>    > 
>>    >       a series of nonblocking synchronizing calls (e.g., MPI_IBARRIER)
>>    >       over all communicators, and the process groups of windows and file handles
>>    >       still associated with the session
>>    > 
>>    >     That this is needed can be seen in the advice to users as part of
>>    >     the definition of MPI_COMM_DISCONNECT.
>>    > 
>>    > 
>>    >     I also prefer your examples.
>>    > 
>>    > 
>>    >     2x typo: generating cz in process 1: foobar3 (instead of  2)
>>    >     on page 504, lines 11 and 31.
>>    > 
>>    > 
>>    >     And for you sentence
>>    > 
>>    >       The semantics of MPI_SESSION_FINALIZE is what would be obtained
>>    >       if the callers initiated
>>    > 
>>    >     may be substituted by
>>    > 
>>    >       MPI_SESSION_FINALIZE may synchronize as
>>    >       if it internally initiates
>>    > 
>>    > 
>>    >     Best regards
>>    >     Rolf
>>    > 
>>    > 
>>    >     ----- Forwarded Message -----
>>    >     From: "Rolf Rabenseifner" <rabenseifner at hlrs.de>
>>    >     To: "Pritchard" <howardp at lanl.gov>
>>    >     Cc: "Martin Schulz" <schulzm at in.tum.de>, "Dan Holmes, MPI" <danholmes at chi.scot>
>>    >     Sent: Friday, February 19, 2021 10:35:58 PM
>>    >     Subject: Re: [EXTERNAL] [mpi-forum/mpi-standard] seesions: add verbiage
>>    >     concerning dynamic process model and sessions model limitations (#521)
>>    > 
>>    >     Hi Howard and all,
>>    > 
>>    >     > Thanks very much.  I cooked up some similar wording and a model for users to
>>    >     > use.  I want feedback from the WG before opening a PR.
>>    > 
>>    >     I haven't seen your wording.
>>    > 
>>    >     But additionally to the text I proposed, technically, an MPI lib has of course
>>    >     only to check (i.e., finish ongoing internal (e.g., weak local)) communication
>>    >     only for communicators that are directly derived from session handle,
>>    >     via MPI_Group_from_session_pset to a pgroup handle, and then probably to
>>    >     sub-pgroup handles and the via MPI_Comm_create_from_group to the communicator.
>>    > 
>>    >     All from such communicators derived subcommunicators can be ignored
>>    >     by an MPI lib implementing MPI_SESSION_FINALIZE.
>>    >     This need not to be mentioned, but it can be mentioned,
>>    >     and this internal optimization opportinity is mainly a good reason
>>    >     why we should never require that the application has to disconnect
>>    >     all its communicators, because this is never needed.
>>    > 
>>    >     Best regards
>>    >     Rolf
>>    > 
>>    >     ----- Original Message -----
>>    >     > From: "Pritchard" <howardp at lanl.gov>
>>    >     > To: "Rolf Rabenseifner" <rabenseifner at hlrs.de>, "Martin Schulz"
>>    >     > <schulzm at in.tum.de>
>>    >     > Cc: "Dan Holmes, MPI" <danholmes at chi.scot>
>>    >     > Sent: Friday, February 19, 2021 7:15:41 PM
>>    >     > Subject: Re: [EXTERNAL] [mpi-forum/mpi-standard] seesions: add verbiage
>>    >     > concerning dynamic process model and sessions
>>    >     > model limitations (#521)
>>    > 
>>    >     > HI Rolf,
>>    >     >
>>    >     > Thanks very much.  I cooked up some similar wording and a model for users to
>>    >     > use.  I want feedback from the WG before opening a PR.
>>    >     >
>>    >     > Howard
>>    >     >
>>    >     >On 2/19/21, 11:05 AM, "Rolf Rabenseifner" <rabenseifner at hlrs.de> wrote:
>>    >     >
>>    >     >    Dear all,
>>    >     >   
>>    >     >    based on my previous email, I recommend the following (small) changes:
>>    >     >   
>>    >     >    MPI-4.0 page 502 lines 30-32 read
>>    >     >   
>>    >     >     "MPI_SESSION_FINALIZE is collective over all MPI processes that
>>    >     >      are connected via MPI Communicators, Windows, or Files that
>>    >     >      were created as part of the Session and still exist."
>>    >     >   
>>    >     >    but should read
>>    >     >   
>>    >     >      \mpifunc{MPI\_SESSION\_FINALIZE} may internally and in parallel execute
>>    >     >      nonblocking collective operations on each existing communicator derived
>>    >     >      from the \mpiarg{session}.
>>    >     >   
>>    >     >      \begin{rationale}
>>    >     >      This rule is similar to the rule that \mpifunc{MPI\_FINALIZE} is collective,
>>    >     >      but prevents from a definition on to which processes the calling process is
>>    >     >      connected.
>>    >     >      It also allows that some processes may derived a set of communicators
>>    >     >      by a different number of session handles, see Example~\ref{XXX}.
>>    >     >      \end{rationale}
>>    >     >   
>>    >     >      \begin{implementors}
>>    >     >      This rule also the completion of communications the process is involved with
>>    >     >      that may not yet be completed from the viewpoint of the underlying MPI system,
>>    >     >      see the advice to implementors for Example 11.6.
>>    >     >      \end{implementors}
>>    >     >   
>>    >     >      \begin{example}
>>    >     >      \label{XXX}
>>    >     >      Three processes are connected with 2 communicators,
>>    >     >      derived from 1 session handle in process rank 0 and from two session handles
>>    >     >      in both process ranks 1 and 2.
>>    >     >      \begin{verbatim}
>>    >     >        process      process       process       Remarks
>>    >     >        rank 0       rank 1        rank 2        ses, ses_A and ses_B are session
>>    >     >        handles.
>>    >     >         (ses)=======(ses_A)=======(ses_A)       communicator_1 and
>>    >     >         (ses)=======(ses_B)=======(ses_B)       communicator_2 are derived from them.
>>    >     >        SF(ses)     SF(ses_A)     SF(ses_A)      SF = MPI_SESSION_FINALIZE
>>    >     >                    SF(ses_B)     SF(ses_B)
>>    >     >      \end{verbatim}
>>    >     >      Process rank 0 has only to finalize its one session handle,
>>    >     >      whereas the other two process have to call
>>    >     >      \mpifunc{MPI\_SESSION\_FINALIZE} twice in the same sequence with respect to
>>    >     >      the underlying communicators and the session handles they are derived from.
>>    >     >      The call \code{SF(ses)} in process rank 0 may by blocked until
>>    >     >      both \code{SF(ses\_A)} and \code{SF(ses\_B)} are called in processes rank 1 and
>>    >     >      2.
>>    >     >      \end{example}
>>    >     >   
>>    >     >   
>>    >     >    This is an elegant solution that is consistent with the existing approach and
>>    >     >    resolves
>>    >     >    the problem with "collective".
>>    >     >   
>>    >     >    Best regards
>>    >     >    Rolf
>>    >     >   
>>    >     >   
>>    >     >    ----- Original Message -----
>>    >     >    > From: "Rolf Rabenseifner" <rabenseifner at hlrs.de>
>>    >     >    > To: "Martin Schulz" <schulzm at in.tum.de>
>>    >     >    > Cc: "Dan Holmes, MPI" <danholmes at chi.scot>, "Pritchard" <howardp at lanl.gov>
>>    >     >    > Sent: Friday, February 19, 2021 3:24:38 PM
>>    >     >    > Subject: Re: [EXTERNAL] [mpi-forum/mpi-standard] seesions: add verbiage
>>    >     >    > concerning dynamic process model and sessions
>>    >     >    > model limitations (#521)
>>    >     >   
>>    >     >    Dear all,
>>    >     >   
>>    >     >    for me, it looks that we risk to completely loose sessions.
>>    >     >   
>>    >     >    In MPI-3.1 we had the clear rules:
>>    >     >   
>>    >     >    1. Everybody of us and the MPI forum must clearly understand that
>>    >     >       the business of MPI_Finalize and MPI_Session_finalize is to
>>    >     >       guarantee that any communication (including also weak local communication)
>>    >     >       is finished.
>>    >     >   
>>    >     >       rank=0     rank=1
>>    >     >       bsend
>>    >     >       finalize
>>    >     >       10 seconds later
>>    >     >                  recv
>>    >     >                  finalize
>>    >     >   
>>    >     >       must work.
>>    >     >       Because of the weak local character of Bsend (see attached test and protocol)
>>    >     >       there must be some communication between rank=0 and rank=1
>>    >     >       typically in the rank=0 finalize that has to wait until all other processes
>>    >     >       joined the collective finalize.
>>    >     >   
>>    >     >    2. After MPI_Finalize, the use of MPI_COMM_WORLD, MPI_COMM_SELF and any
>>    >     >       derived communicators, window handles or files is erroneous.
>>    >     >   
>>    >     >    3. MPI_Finalize does not disconnect or free any communicator.
>>    >     >   
>>    >     >    4. Item 3. has one exception: MPI_COMM_SELF is freed with the implication
>>    >     >       that callback functions comm_delete_attr_fn are called
>>    >     >       if attributes are set for MPI_COM_SELF.
>>    >     >   
>>    >     >    With MPI-4.0, are these three basic rules still true,
>>    >     >    or was the World Model changed without strong notice to the MPI forum?
>>    >     >   
>>    >     >    About 1. In MPI-3.1 page 357 line 4
>>    >     >             and in MPI-4.0 page 495 line 27:
>>    >     >             "MPI_FINALIZE is collective over all connected processes."
>>    >     >   
>>    >     >             This sentence is the basis for the following Advice to implementors:
>>    >     >             MPI-3.1 Sect.8.7, MPI_Finalize, after Example 8.9, page 359, lines 8-18.
>>    >     >             MPI-4.0 Sect.11.2.2, MPI_Finalize, after Exa. 11.6, page 496, lines 38-48.
>>    >     >         Okay.
>>    >     >   
>>    >     >    About 2. MPI-3.1 page 359, lines 19-22 says:
>>    >     >             "Once MPI_FINALIZE returns, no MPI routine (not even MPI_INIT) may be called,
>>    >     >              except for MPI_GET_VERSION, MPI_GET_LIBRARY_VERSION, MPI_INITIALIZED,
>>    >     >              MPI_FINALIZED, and any function with the prefix MPI_T_ (within the constraints
>>    >     >              for
>>    >     >              functions with this prefix listed in Section 14.3.4)."
>>    >     >             This text implies that handles like MPI_COMM_WORLD cannot be further used.
>>    >     >   
>>    >     >             MPI-4.0 page 487, lines 36-38 say
>>    >     >             "MPI_COMM_WORLD is only valid for use as a communicator in the World Model,
>>    >     >              i.e., after a successful call to MPI_INIT or MPI_INIT_THREAD
>>    >     >              and before a call to MPI_FINALIZE."
>>    >     >             MPI-4.0 page 497 line 41 only says:
>>    >     >             "In the World Model, once MPI has been finalized it cannot be restarted."
>>    >     >         Okay.
>>    >     >   
>>    >     >    About 3. MPI-3.1 page 357, lines 42-43, and
>>    >     >             MPI-4.0 page 495, lines 25-26:
>>    >     >             "The call to MPI_FINALIZE does not free objects created by
>>    >     >              MPI calls; these objects are freed using MPI_XXX_FREE calls."
>>    >     >         Okay.
>>    >     >   
>>    >     >    About 4. It is described in MPI-3.1 Section 8.7.1 and
>>    >     >             in MPI-4.0 Sect. 11.2.4
>>    >     >         Okay.
>>    >     >   
>>    >     >   
>>    >     >    And now about MPI-4.0 MPI_Session_finalize:
>>    >     >   
>>    >     >    The wording is a copy of the wording of MPI_Finalize.
>>    >     >   
>>    >     >    About 1. MPI-4.0 page 502 line 30-32:
>>    >     >             "MPI_SESSION_FINALIZE is collective over all MPI processes that
>>    >     >              are connected via MPI Communicators, Windows, or Files that
>>    >     >              were created as part of the Session and still exist."
>>    >     >   
>>    >     >             But the same important "Advice to implementors" is missing.
>>    >     >   
>>    >     >             Not problematic, because the statement about collective is enough
>>    >     >             because Example 11.6 has to work in World and Sessions Model.
>>    >     >   
>>    >     >    About 2. MPI-4.0 page 502 lines 24-27 say:
>>    >     >             "Before an MPI process invokes MPI_SESSION_FINALIZE, the process
>>    >     >              must perform all MPI calls needed to complete its involvement
>>    >     >              in MPI communications: it must locally complete all MPI operations
>>    >     >              that it initiated and it must execute matching calls needed to
>>    >     >              complete MPI communications initiated by other processes.
>>    >     >   
>>    >     >             This sentence implies that after MPI_Session_finalize the use of
>>    >     >             derived communicators is erroneous.
>>    >     >   
>>    >     >    About 3. MPI-4.0 page 502 lines 28-29 say:
>>    >     >             "The call to MPI_SESSION_FINALIZE does not free objects created by
>>    >     >              MPI calls; these objects are freed using MPI_XXX_FREE calls."
>>    >     >   
>>    >     >             Same sentence as for MPI_Finalize.
>>    >     >   
>>    >     >    About 4. There is no such rule.
>>    >     >             Okay, because there is no such MPI_COMM_SELF.
>>    >     >             If a library creates a session_comm_self1 derived from a session
>>    >     >             handle session1, then it must call MPI_Comm_free(mpi_coll_self1)
>>    >     >             before calling MPI_Session_finalize(session1).
>>    >     >   
>>    >     >   
>>    >     >    Result:
>>    >     >     I.  All looks consistent.
>>    >     >     II. The small sentence about collective of MPI_Session_finalize
>>    >     >         is a bit broken.
>>    >     >   
>>    >     >    Consequence:
>>    >     >     Item II. should be repared without distroying the consistency
>>    >     >     of the whole chapter.
>>    >     >   
>>    >     >    Best regards
>>    >     >    Rolf
>>    >     >   
>>    >     >   
>>    >     >    ----- Original Message -----
>>    >     >    > From: "Martin Schulz" <schulzm at in.tum.de>
>>    >     >    > To: "Rolf Rabenseifner" <rabenseifner at hlrs.de>
>>    >     >    > Cc: "Dan Holmes, MPI" <danholmes at chi.scot>, "Pritchard" <howardp at lanl.gov>
>>    >     >    > Sent: Thursday, February 18, 2021 11:18:39 PM
>>    >     >    > Subject: Re: [EXTERNAL] [mpi-forum/mpi-standard] seesions: add verbiage
>>    >     >    > concerning dynamic process model and sessions
>>    >     >    > model limitations (#521)
>>    >     >   
>>    >     >    > Hi Rolf,
>>    >     >    >
>>    >     >    > Well, technically doesn't any of the changes we are discussing require a 2 vote,
>>    >     >    > but we are trying to wrap this into the RC process? I was just trying to
>>    >     >    > propose the least impactful solution that allows us to move forward with 4.0 -
>>    >     >    > I think we could all agree on "user has to free everything" because it is most
>>    >     >    > easy to see that it is correct.
>>    >     >    >
>>    >     >    > In general, I agree with you - this is not what the user wants or expects and we
>>    >     >    > should think about this in 4.1 - I just have concerns that we won't agree on
>>    >     >    > the text in short order. The general idea sounds good, but how do we write up
>>    >     >    > the details? At the end, this would again be a collective and possibly
>>    >     >    > synchronizing operation - and, if so, collective over what group? That's where
>>    >     >    > we diverged in our opinion. I would also say, that this turns into a collective
>>    >     >    > operation over the bubble in all processes, but I think Dan disagrees here.
>>    >     >    >
>>    >     >    > My second comment is, though, what does this solution actually mean for the
>>    >     >    > user. We still have the sentence "Session_finalize does not free the objects".
>>    >     >    > Do we want to change that? In contrast to MPI_Finalize, we expect programs to
>>    >     >    > continue after Session_finalize, so someone has to free the objects, which
>>    >     >    > comes again to the point that a user must free all objects anyway - so why even
>>    >     >    > try to make Session_finalize disconnect items, if one can only write a correct
>>    >     >    > (memory clean) program when freeing all objects manually?
>>    >     >    >
>>    >     >    > The actual alternative that a user would expect is that Session_finalize
>>    >     >    > actually frees all objects. This would be, IMHO, a larger change - but we could
>>    >     >    > decide that in 4.1.
>>    >     >    >
>>    >     >    > Cheers,
>>    >     >    >
>>    >     >    > Martin
>>    >     >    >
>>    >     >    >
>>    >     >    >
>>    >     >    >
>>    >     >    > --
>>    >     >    > Prof. Dr. Martin Schulz, Chair of Computer Architecture and Parallel Systems
>>    >     >    > Department of Informatics, TU-Munich, Boltzmannstraße 3, D-85748 Garching
>>    >     >    > Member of the Board of Directors at the Leibniz Supercomputing Centre (LRZ)
>>    >     >    > Email: schulzm at in.tum.de
>>    >     >    >
>>    >     >    >
>>    >     >    >
>>    >     >    >On 18.02.21, 22:53, "Rolf Rabenseifner" <rabenseifner at hlrs.de> wrote:
>>    >     >    >
>>    >     >    >    Dear Martin and all,
>>    >     >    >
>>    >     >    >    To require that all communicators are disconnected by the user
>>    >     >    >     - is a two vote change,
>>    >     >    >     - is a catastrophic service for normal users,
>>    >     >    >     - and I thought, that sessions is not only for libnrary writers?
>>    >     >    >     - And there is no need for this drastic change
>>    >     >    >       because we have two solutions fo a Session_finalize that behaves
>>    >     >    >       like normal Finalize:
>>    >     >    >        - bahaves like a set of nonblocking barriers may be executed for each derived
>>    >     >    >        communicator
>>    >     >    >        - based on the bubbles
>>    >     >    >
>>    >     >    >    Best regards
>>    >     >    >    Rolf
>>    >     >    >
>>    >     >    >    ----- Original Message -----
>>    >     >    >    > From: "Martin Schulz" <schulzm at in.tum.de>
>>    >     >    >    > To: "Dan Holmes, MPI" <danholmes at chi.scot>, "Pritchard" <howardp at lanl.gov>
>>    >     >    >    > Cc: "Rolf Rabenseifner" <rabenseifner at hlrs.de>
>>    >     >    >    > Sent: Thursday, February 18, 2021 10:38:15 PM
>>    >     >    >    > Subject: Re: [EXTERNAL] [mpi-forum/mpi-standard] seesions: add verbiage
>>    >     >    >    > concerning dynamic process model and sessions
>>    >     >    >    > model limitations (#521)
>>    >     >    >
>>    >     >    >    > Hi Dan, all,
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    > I personally like the idea of forcing the user to free all elements and then
>>    >     >    >    > declaring MPI_Session_finalize a local operation. This would make the init and
>>    >     >    >    > the finalize symmetric and avoid all issues. Further, if we do want a more
>>    >     >    >    > “collective” behavior later on, it could easily be added.
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    > As for changes – I have the feeling that this is the easiest to get accepted for
>>    >     >    >    > now, as it is the most restrictive. All other solution open the debate about
>>    >     >    >    > what the meaning exactly is – I think this is the more dangerous route for 4.0.
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    > Just my 2c,
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    > Martin
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    > --
>>    >     >    >    > Prof. Dr. Martin Schulz, Chair of Computer Architecture and Parallel Systems
>>    >     >    >    > Department of Informatics, TU-Munich, Boltzmannstraße 3, D-85748 Garching
>>    >     >    >    > Member of the Board of Directors at the Leibniz Supercomputing Centre (LRZ)
>>    >     >    >    > Email: schulzm at in.tum.de
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    > From: Dan Holmes <danholmes at chi.scot>
>>    >     >    >    > Date: Thursday, 18. February 2021 at 21:51
>>    >     >    >    > To: "Pritchard Jr., Howard" <howardp at lanl.gov>
>>    >     >    >    > Cc: Rolf Rabenseifner <rabenseifner at hlrs.de>, "schulzm at in.tum.de"
>>    >     >    >    > <schulzm at in.tum.de>
>>    >     >    >    > Subject: Re: [EXTERNAL] [mpi-forum/mpi-standard] seesions: add verbiage
>>    >     >    >    > concerning dynamic process model and sessions model limitations (#521)
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    > Hi Howard,
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    > We can argue (I don’t know how successfully, but we can try) that the user was
>>    >     >    >    > already required to do any clean up they wanted to be done of the state
>>    >     >    >    > associated with session-derived objects - because MPI_SESSION_FINALIZE
>>    >     >    >    > explicitly disclaims any responsibility for doing it and sloping-shoulders it
>>    >     >    >    > onto the existing MPI_XXX_FREE procedures, which are in the user facing API,
>>    >     >    >    > strongly suggesting that the user must call them if they want that work to be
>>    >     >    >    > done.
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    > The current text leaves open the loophole that the user could just leave those
>>    >     >    >    > objects dangling (definitely not cleaned up but also, perhaps, no longer
>>    >     >    >    > functional?) and just carry on regardless until the process ends and it all
>>    >     >    >    > gets cleaned up by the OS/job scheduler/runtime/reboot by an annoyed sys admin.
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    > Note that initialising a persistent operation, then freeing the communicator it
>>    >     >    >    > uses, then starting and completing that operation works in most MPI libraries
>>    >     >    >    > because of internal reference counting. Verdict: yuk! This is the reason behind
>>    >     >    >    > the discussion of deprecating MPI_COMM_FREE (in favour of MPI_COMM_DISCONNECT
>>    >     >    >    > and, eventually, MPI_COMM_IDISCONNECT, which is a more direct replacement, even
>>    >     >    >    > though it requires a subsequent MPI_WAIT, the functionality of which is
>>    >     >    >    > currently done by MPI_FINALIZE).
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    > Does this mean we should expect a communicator derived from a session that has
>>    >     >    >    > not been freed/disconnected to continue working normally even after
>>    >     >    >    > MPI_SESSION_FINALIZE? If so, yuk! Let’s head off this question before a user
>>    >     >    >    > asks it!
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    > Cheers,
>>    >     >    >    >
>>    >     >    >    > Dan.
>>    >     >    >    >
>>    >     >    >    > —
>>    >     >    >    >
>>    >     >    >    > Dr Daniel Holmes PhD
>>    >     >    >    >
>>    >     >    >    > Executive Director
>>    >     >    >    > Chief Technology Officer
>>    >     >    >    >
>>    >     >    >    > CHI Ltd
>>    >     >    >    >
>>    >     >    >    > danholmes at chi.scot
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    > On 18 Feb 2021, at 20:14, Pritchard Jr., Howard <howardp at lanl.gov> wrote:
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    > HI Dan,
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    > Short answer to your first question was I was commencing on something this
>>    >     >    >    > morning.
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    > I’m in a meeting but will get out to return to this later.  I’ll check the
>>    >     >    >    > comments.  My only concern about declaring mpi_session_finalize as local with
>>    >     >    >    > user requirement to clean up might be taken as a big change from what was voted
>>    >     >    >    > on for sessions.
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    > Howard
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    > From: Dan Holmes <danholmes at chi.scot>
>>    >     >    >    > Date: Thursday, February 18, 2021 at 12:08 PM
>>    >     >    >    > To: "Pritchard Jr., Howard" <howardp at lanl.gov>
>>    >     >    >    > Cc: Rolf Rabenseifner <rabenseifner at hlrs.de>, Martin Schulz <schulzm at in.tum.de>
>>    >     >    >    > Subject: [EXTERNAL] Fwd: [mpi-forum/mpi-standard] seesions: add verbiage
>>    >     >    >    > concerning dynamic process model and sessions model limitations (#521)
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    > Hi Howard (cc'd Rolf & Martin),
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    > I see you are progressing through the extensive to-do list for the Sessions
>>    >     >    >    > WG/Dynamic Chapter Committee. Thanks - all good work, as far as I can see.
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    > Are you currently writing text for the Rolf “is session finalise broken” issue?
>>    >     >    >    > I don’t want to duplicate effort.
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    > I saw Rolf added a comment onto issue 435 trying to summarise the outcome and
>>    >     >    >    > affects of the meeting yesterday. I added my own attempt to capture the bits of
>>    >     >    >    > the discussion that I thought were worth capturing.
>>    >     >    >    >
>>    >     >    >    > We both end up in the place: what I call option 2b - we need new text about
>>    >     >    >    > “fork lotsa threads, execute all clean up actions, join threads”.
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    > I opened the MPI-4.0-RC-Feb21 document to begin figuring out what is needed and
>>    >     >    >    > what hits me is this:
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    > §11.3.1 lines 28-29 on page 502:
>>    >     >    >    >
>>    >     >    >    > "The call to MPI_SESSION_FINALIZE does not free objects created by MPI calls;
>>    >     >    >    > these objects are freed using MPI_XXX_FREE calls.”
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    > Doh!
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    > My immediate question in response to this is: WHY IS MPI_SESSION_FINALIZE
>>    >     >    >    > NON-LOCAL AT ALL?
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    > It does not clean up distributed objects (communicators, windows, files) so does
>>    >     >    >    > it do anything non-local? If so, what is that thing? It seems to specifically
>>    >     >    >    > exclude from its to-do list all of the actions that might have required
>>    >     >    >    > non-local semantics.
>>    >     >    >    >
>>    >     >    >    > Our arguments in the meeting yesterday centred around session_finalize doing the
>>    >     >    >    > job of comm_disconnect (probably my fault, but Rolf’s ticket assumes “may
>>    >     >    >    > synchronise” because of the word "collective") for all still existing
>>    >     >    >    > communicators (windows and files) derived from the session. This is
>>    >     >    >    > understandable because MPI_FINALIZE states “cleans up all MPI state associated
>>    >     >    >    > with the World Model” (§11.2, line 11, page 495). So, this procedure is already
>>    >     >    >    > very different to that existing one.
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    > Is a better resolution to this whole mess to say “MPI_SESSION_FINALIZE is a
>>    >     >    >    > \mpiterm{local} MPI procedure” instead of lines 30-34 (because we have no good
>>    >     >    >    > reason for it to be collective or even non-local) and add to line 27 “ and free
>>    >     >    >    > all objects created or derived from this session” (if session_finalize does not
>>    >     >    >    > do this, but it must be done [ED: please check, must this be done?], then the
>>    >     >    >    > user must be responsible for doing it)?
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    > That is, we should be choosing OPTION (1) in my summary!?!
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    > Alternatively, should MPI_SESSION_FINALIZE say something like “cleans up all MPI
>>    >     >    >    > state associated with the specified session” - then we can *remove lines 28-29*
>>    >     >    >    > (or remove the word “not”) and replace lines 30-34 with OPTION 2b?
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    > Cheers,
>>    >     >    >    >
>>    >     >    >    > Dan.
>>    >     >    >    >
>>    >     >    >    > —
>>    >     >    >    >
>>    >     >    >    > Dr Daniel Holmes PhD
>>    >     >    >    >
>>    >     >    >    > Executive Director
>>    >     >    >    > Chief Technology Officer
>>    >     >    >    >
>>    >     >    >    > CHI Ltd
>>    >     >    >    >
>>    >     >    >    > danholmes at chi.scot
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    > Begin forwarded message:
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    > From: Howard Pritchard <notifications at github.com>
>>    >     >    >    >
>>    >     >    >    > Subject: Re: [mpi-forum/mpi-standard] seesions: add verbiage concerning dynamic
>>    >     >    >    > process model and sessions model limitations (#521)
>>    >     >    >    >
>>    >     >    >    > Date: 18 February 2021 at 17:54:18 GMT
>>    >     >    >    >
>>    >     >    >    > To: mpi-forum/mpi-standard <mpi-standard at noreply.github.com>
>>    >     >    >    >
>>    >     >    >    > Cc: Dan Holmes <danholmes at compudev.co.uk>, Review requested
>>    >     >    >    > <review_requested at noreply.github.com>
>>    >     >    >    >
>>    >     >    >    > Reply-To: mpi-forum/mpi-standard
>>    >     >    >    > <reply+ADD7YSWYFBWNSOBGN7KNE5V6HKFMVEVBNHHDAW7GIY at reply.github.com>
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    >
>>    >     >    >    > @hppritcha requested your review on: #521 seesions: add verbiage concerning
>>    >     >    >    > dynamic process model and sessions model limitations.
>>    >     >    >    >
>>    >     >    >    > —
>>    >     >    >    > You are receiving this because your review was requested.
>>    >     >    >    > Reply to this email directly, view it on GitHub, or unsubscribe.
>>    >     >   
>>    >     >   
>>    >     >    --
>>    >     >    Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de .
>>    >     >    High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530 .
>>    >     >    University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832 .
>>    >     >    Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner .
>>    >     >     Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307) .
>>    > 
>>    >     --
>>    >     Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de .
>>    >     High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530 .
>>    >     University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832 .
>>    >     Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner .
>>    >     Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307) .
>>    >     --
>>    >     Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de .
>>    >     High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530 .
>>    >     University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832 .
>>    >     Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner .
>>    >     Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307) .
>>    > 
>>    > 
>>    >     _______________________________________________
>>    >     mpiwg-sessions mailing list
>>    >     mpiwg-sessions at lists.mpi-forum.org
>>    >     https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions
>>    > 
>>    > 
>>    > _______________________________________________
>>    > mpiwg-sessions mailing list
>>    > mpiwg-sessions at lists.mpi-forum.org
>>    > https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions
>>    _______________________________________________
>>    mpiwg-sessions mailing list
>>    mpiwg-sessions at lists.mpi-forum.org
>>    https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions
>>    
>> 
>> _______________________________________________
>> mpiwg-sessions mailing list
>> mpiwg-sessions at lists.mpi-forum.org
>> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions
> 
> --
> Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de .
> High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530 .
> University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832 .
> Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner .
> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307) .

-- 
Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de .
High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530 .
University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832 .
Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner .
Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307) .




More information about the mpiwg-sessions mailing list