[mpiwg-sessions] FW: Virtual MPI forum meeting June 7, 2023 / Vote announcement for the July 2023 meeting of the MPI Forum

Holmes, Daniel John daniel.john.holmes at intel.com
Thu Jun 8 09:15:44 CDT 2023


Hi Howard (& all),

In the MPI Virtual meeting yesterday, Rolf presented a PR that tries to define the word "pending" in MPI. The discussion raised a whole bunch of issues and concerns. I took an action to raise some of them in the Sessions WG because they are introduced by (or made worse by) the sessions model.

Quick summary of options in Rolf's proposal:
1) Remove the word "pending" from the MPI Standard -- replace it with "active" where that makes sense, or something else when appropriate.
2) Define "pending" to be a strict alias of "active" -- it can only be applied to MPI operations and it means "starting stage has been done but completion stage has not yet been done"
2) Define "pending" to be anything task/activity on MPI 's to-do list -- encompassing all active operations but also all decoupled activities (i.e. progress).

We discovered in the discussion of these options, in the context of MPI_FINALIZE and MPI_DISCONNECT (which must delay their return until "pending" things have been done) that an important restriction has been unintentionally removed by the introduction of sessions and we probably need to re-introduce parts of that restriction.

When there was only the World Model, it was erroneous to call things like MPI_REQUEST_FREE after MPI_FINALIZE. Questions like "can I free a request after finalize?" -- derived from questions like "what happens to inactive persistent requests when I finalize MPI?" -- made no sense because you can look at the code with MPI_REQUEST_FREE after MPI_FINALIZE and immediate declare it is an erroneous program.

Once sessions came into existence though, we can now call MPI_REQUEST_FREE after MPI_FINALIZE, as long as there is a session:

```
MPI_SESSION_INIT(&sh)
MPI_INIT()
MPI_RECV_INIT(..., MPI_COMM_WORLD, &req)
MPI_FINALIZE()
MPI_REQUEST_FREE(&req)
MPI_SESSION_FINALIZE()
```

This is clearly A Bad Idea (tm), but which MPI rule has been broken in the above pseudo-code? Without the two sessions lines of code, this is clearly erroneous, but with them it is not. Do we want this pattern to be erroneous or well-defined?

There is no semantic reason why this cannot be deemed legal -- MPI_REQUEST_FREE is a local procedure, so the fact MPI_COMM_WORLD is no longer functional should be irrelevant -- it could just clean up locally allocated resources (the connections to other processes were cleaned up during MPI_FINALIZE).

One technical reason to define this pattern to be erroneous is that the request probably has a ref-counted reference to the communicator and MPI_FINALIZE will hang waiting for that ref-count to reach zero, which it cannot do because the request will not be freed until after it returns. This is an implementation detail, but it is how at least one major MPI library is currently implemented.

An implementation could instead choose to store a weak_ptr (https://en.cppreference.com/w/cpp/memory/weak_ptr) to the communicator in the request when the operation is inactive, attempt to get a shared_ptr from the weak_ptr (https://en.cppreference.com/w/cpp/memory/weak_ptr/lock) during the starting stage, and give up the shared_ptr during the completion stage. In such an implementation, an inactive request does not own the communicator associated with the operation, so finalize can destroy the communicator, and a freeing an inactive request never needs to dereference to get the communicator, so it doesn't care whether the communicator still exists or not.

Much more important than technical feasibility is the question "why should we enable this pattern?" mumble mumble C++ destructors mumble garbage collected languages mumble.

Taking a step back, there is a whole category of programs that used to be erroneous by definition (cannot make MPI calls after MPI_FINALIZE) that are no longer obviously erroneous. What do we do about them all?

There are related issues/concerns here: if a rule is added to MPI that forces MPI_FINALIZE to free all requests associated with the World Model, should sessions follow suit and have MPI_SESSION_FINALIZE free requests from that session? There is text exhorting the user to enable and complete all MPI operations (in the World Mode) before calling MPI_FINALIZE, but there is no such exhortation for the sessions model; should there be one? Does finalizing a session destroy all the communicators from that session? Can a communicator handle be used after the session is finalized (e.g. in MPI_COMM_FREE)?

Best wishes,
Dan.

-----Original Message-----
From: Rolf Rabenseifner <rabenseifner at hlrs.de> 
Sent: Wednesday, June 7, 2023 6:10 PM
To: Holmes, Daniel John <daniel.john.holmes at intel.com>
Subject: Fwd: Virtual MPI forum meeting June 7, 2023 / Vote announcement for the July 2023 meeting of the MPI Forum

The pdf is
 https://github.com/mpi-forum/mpi-standard/files/11676448/mpi41-report_Issue710_PR823_update_for.2023-6-7-meeting.pdf
especially the questions on page 482 and 524.

----- Forwarded Message -----
From: "Rolf Rabenseifner" <rabenseifner at hlrs.de>
To: "Main MPI Forum mailing list" <mpi-forum at lists.mpi-forum.org>
Sent: Wednesday, June 7, 2023 1:51:04 PM
Subject: Re: Virtual MPI forum meeting June 7, 2023 / Vote announcement for the July 2023 meeting of the MPI Forum

Dear MPI forum members,

 last update for #710/PR823: second option (alternative B) and open questions for MPI_Session_finalize, see  https://github.com/mpi-forum/mpi-standard/files/11676448/mpi41-report_Issue710_PR823_update_for.2023-6-7-meeting.pdf
 for the virtual meeting today.

Kind regards
Rolf
 
________ email from yesterday: _________

for the virtual meeting this week (June 7, 2023) the following readings and discussions are scheduled.
Some technical decisions are needed and it may be best to do these decisions as institutional staw votes in an additional virtual meeting (e.g. next week), rather to do them ad hoc this week.

The topics are #705/PR822,  #710/PR823 with two options,  #676/PR824 or PR825.

Details:
__________________________
#705 Errata: Fortran has only compile-time constants    Rolf,Joseph
     Issue https://github.com/mpi-forum/mpi-issues/issues/705
           (together with https://github.com/mpi-forum/mpi-issues/issues/657   Rolf, Jeff H.)
     PR    https://github.com/mpi-forum/mpi-standard/pull/822
     PDF
     https://github.com/mpi-forum/mpi-standard/files/11450065/mpi41-report_Issue705%2B657_PR822.pdf

This proposal seems to be already stable.

__________________________
#710 Errata: 'Pending communication' not defined in MPI_Comm_disconnect    Rolf
     Issue https://github.com/mpi-forum/mpi-issues/issues/710
     PR    https://github.com/mpi-forum/mpi-standard/pull/823
updated: 
     PDF   https://github.com/mpi-forum/mpi-standard/files/11676448/mpi41-report_Issue710_PR823_update_for.2023-6-7-meeting.pdf


There is severe critism about the current proposal in PR 823:

  "In this PR here, as far as I can see, the implementations are mainly affected if they
   incorrectly require that there must not be any inactive request handle using the given comm.
   Whether an MPI lib really internally frees the inactive handles or not,
   is mainly a question of having dangling handles or not."
  "Is there broad agreement that this is incorrect implementation behavior? I'm skeptical."

And therefore, there are two different solutions:
  - An advice to users, telling the consequences if they do not free inactive
    handles before calling MPI_Comm_disconnect or MPI_Session_finalize.
or
  - Automatically freeing them.

And furthermore:

  "Whatever we decide, I imagine this needs to extend to MPI_SESSION_FINALIZE as well."

I'll try to update the proposal before the meeting to have text for both possible solutions.

__________________________
#676 Errata: 'Pending operation' not defined, pending proper definition  Rolf,Joseph
     Issue https://github.com/mpi-forum/mpi-issues/issues/676
     PR    https://github.com/mpi-forum/mpi-standard/pull/824
     PDF   https://github.com/mpi-forum/mpi-standard/files/11668075/mpi41-report_Issue676_PR824.pdf

       Substituting "pending operation" by "active operation"

  or PR  https://github.com/mpi-forum/mpi-standard/pull/825
     PDF https://github.com/mpi-forum/mpi-standard/files/11669354/mpi41-report_Issue676_PR825.pdf

       Defining "pending operation" as "active operation" (see page 12)

Best regards
Rolf


----- Original Message -----
> From: "Rolf Rabenseifner" <rabenseifner at hlrs.de>
> To: "Main MPI Forum mailing list" <mpi-forum at lists.mpi-forum.org>
> Cc: "Christoph Niethammer" <niethammer at hlrs.de>, "Puri Bangalore" <pvbangalore at ua.edu>, "Joseph Schuchart"
> <schuchart at icl.utk.edu>
> Sent: Wednesday, May 24, 2023 1:36:42 PM
> Subject: Re: Update: Vote announcement for the July 2023 meeting of 
> the MPI Forum

> Martin, Wes, and all,
>
> I expect that we should reserve also June 14, 2023 for a possible 
> continuation of open questions resulting from the discussions of
>
>> #705/PR822, #710/PR823, and #676/PR824 are now finalized at least for 
>> the virtual forum meeting, June 7, 2023
>
> and other pending items for MPI-4.1 from the June 7 virtual forum meeting.
>
> Best regards
> Rolf
>
>
>
>
> ----- Original Message -----
>> From: "Rolf Rabenseifner" <rabenseifner at hlrs.de>
>> To: "Main MPI Forum mailing list" <mpi-forum at lists.mpi-forum.org>
>> Cc: "Christoph Niethammer" <niethammer at hlrs.de>, "Puri Bangalore"
>> <pvbangalore at ua.edu>, "Joseph Schuchart"
>> <schuchart at icl.utk.edu>
>> Sent: Thursday, May 11, 2023 8:05:07 PM
>> Subject: Update: Vote announcement for the July 2023 meeting of the 
>> MPI Forum
>
>> Dear forum members,
>>
>> #705/PR822, #710/PR823, and #676/PR824 are now finalized at least for 
>> the virtual forum meeting, June 7, 2023, details below:
>>
>>    #705 Errata: Fortran has only compile-time constants    Rolf,Joseph
>>      Issue https://github.com/mpi-forum/mpi-issues/issues/705
>>            (together with https://github.com/mpi-forum/mpi-issues/issues/657   Rolf, Jeff
>>            H.)
>>      PR    https://github.com/mpi-forum/mpi-standard/pull/822
>>      PDF
>>      
>> https://github.com/mpi-forum/mpi-standard/files/11450065/mpi41-report
>> _Issue705%2B657_PR822.pdf
>>
>>    #710 Errata: 'Pending communication' not defined in MPI_Comm_disconnect    Rolf
>>      Issue https://github.com/mpi-forum/mpi-issues/issues/710
>>      PR    https://github.com/mpi-forum/mpi-standard/pull/823
>>      PDF
>>      
>> https://github.com/mpi-forum/mpi-standard/files/11454896/mpi41-report
>> _Issue710_PR823.pdf
>>
>>    #676 Errata: 'Pending operation' not defined, pending proper definition
>>    Rolf,Joseph
>>      Issue https://github.com/mpi-forum/mpi-issues/issues/676
>>      PR    https://github.com/mpi-forum/mpi-standard/pull/824
>>      PDF
>>      
>> https://github.com/mpi-forum/mpi-standard/files/11443976/mpi41-report
>> _Issue676_PR824.pdf
>>
>> The agenda of the virtual meeting should be reading and discussions if needed.
>> I did the proposals based on the discussion and the major goals:
>> - backward compatible,
>> - no performance drawbacks on critical paths,
>> - full consistency of the solution with all related parts of the MPI standard.
>>
>> If I have overseen something, then my apologies.
>>
>> If somebody wants to provide a completely different solution to one 
>> of these issues, please feel free to do it.
>> - Thanks to Joseph who did PR 822, which was such a counter proposal for #705.
>> - This method is often better than doing many change requests for the given PR.
>>   and it helps to completely check whether it is really consistent over
>>   all related parts of the MPI standard.
>> - And thanks for all the comments so far. They really helped me a lot 
>> for preparing the three PRs.
>>
>> I'll be the next three weeks on vacation.
>>
>> Best regards
>> Rolf Rabenseifner
>>
>>
>> ----- Original Message -----
>>> From: "Rolf Rabenseifner" <rabenseifner at hlrs.de>
>>> To: "Main MPI Forum mailing list" <mpi-forum at lists.mpi-forum.org>
>>> Cc: "Christoph Niethammer" <niethammer at hlrs.de>, "Puri Bangalore"
>>> <pvbangalore at ua.edu>, "Joseph Schuchart"
>>> <schuchart at icl.utk.edu>
>>> Sent: Tuesday, May 9, 2023 6:07:04 PM
>>> Subject: Vote announcement for the July 2023 meeting of the MPI 
>>> Forum
>>
>>> Dear forum members,
>>>                       (correction: it is for the July 2023 meeting)
>>>
>>> I would like to make the following announcements for the next MPI 
>>> Forum Meeting (July 10-13, 2023):
>>>
>>> - 2nd votes on
>>>    #669 Add operation state 'enabled' and 'local calls' into Terms    Rolf
>>>      Issue https://github.com/mpi-forum/mpi-issues/issues/669
>>>      PR    https://github.com/mpi-forum/mpi-standard/pull/788
>>>
>>>    #457 Improvements around the word "rank" in the Process Topologies chapter
>>>    Christoph,Rolf
>>>      Issue https://github.com/mpi-forum/mpi-issues/issues/457
>>>      PR    https://github.com/mpi-forum/mpi-standard/pull/804
>>>      
>>>    #485 Fix Incorrect Usage of Rank/Task/etc. in Language Bindings Chapter
>>>    Rolf,Puri
>>>      Issue https://github.com/mpi-forum/mpi-issues/issues/485
>>>      PR     https://github.com/mpi-forum/mpi-standard/pull/803
>>>      PDF
>>>      
>>> https://github.com/mpi-forum/mpi-standard/files/10831598/mpi41-repor
>>> t_Issue485_PR803.pdf
>>>
>>> - errata reading and vote on
>>>
>>>    #679 Errata: Noncollective (for procedure) and nonpersistent are not defined
>>>    Rolf
>>>      Issue https://github.com/mpi-forum/mpi-issues/issues/679
>>>      PR    https://github.com/mpi-forum/mpi-standard/pull/820
>>>      PDF
>>>      
>>> https://github.com/mpi-forum/mpi-standard/files/11432286/mpi41-repor
>>> t_Issue679_PR820.pdf
>>>
>>> - pre-announcement for errata reading and vote on  --> See also 
>>> virtual meeting, June 7, 2023 !!!
>>>
>>>    #705 Errata: Fortran has only compile-time constants    Rolf,Joseph
>>>      Issue https://github.com/mpi-forum/mpi-issues/issues/705
>>>            (together with https://github.com/mpi-forum/mpi-issues/issues/657   Rolf, Jeff
>>>            H.)
>>>      PR    https://github.com/mpi-forum/mpi-standard/pull/822
>>>
>>>    #710 Errata: 'Pending communication' not defined in MPI_Comm_disconnect    Rolf
>>>      Issue https://github.com/mpi-forum/mpi-issues/issues/710
>>>      PR    https://github.com/mpi-forum/mpi-standard/pull/823
>>>
>>>    #676 Errata: 'Pending operation' not defined, pending proper definition
>>>    Rolf,Joseph
>>>      Issue https://github.com/mpi-forum/mpi-issues/issues/676
>>>      PR    https://github.com/mpi-forum/mpi-standard/pull/824
>>>
>>> Best regards
>>> Rolf Rabenseifner


--
Dr. Rolf Rabenseifner . . . . . . . . . .. . . rabenseifner at hlrs.de .
High Performance Computing Center (HLRS) . . . ++49(0)711/685-65530 .
University of Stuttgart . . . . . . www.hlrs.de/people/rabenseifner .
Nobelstr. 19, 70569 Stuttgart, Germany
--
Dr. Rolf Rabenseifner . . . . . . . . . .. . . rabenseifner at hlrs.de .
High Performance Computing Center (HLRS) . . . ++49(0)711/685-65530 .
University of Stuttgart . . . . . . www.hlrs.de/people/rabenseifner .
Nobelstr. 19, 70569 Stuttgart, Germany


More information about the mpiwg-sessions mailing list