[Mpi-forum] MPI_Request_free restrictions

Jim Dinan james.dinan at gmail.com
Fri Aug 14 12:28:48 CDT 2020


Sorry, we seem to have lost the mailing list for the last couple messages
below (my fault).

The text on MPI_FINALIZE does not mandate “no pending communication”, it
requires “all MPI calls needed to complete its involvement …"
"Before an MPI process invokes MPI_FINALIZE, the process must perform
all MPI calls needed to complete its involvement in MPI communications
associated with the World Model. It must locally complete
all MPI operations that it initiated and must execute matching calls needed
to complete MPI communications initiated by other processes. For example,
if the process executed a nonblocking send, it must eventually
call MPI_WAIT, MPI_TEST, MPI_REQUEST_FREE, or any derived function” §10.2.2
in MPI-4.0


Consider a simple test that does an MPI_Isend that has no matching recv,
frees the request, and then calls MPI_Finalize.

Does the above text say this should work? Or not?

 ~Jim.

On Fri, Aug 14, 2020 at 9:28 AM HOLMES Daniel <d.holmes at epcc.ed.ac.uk>
wrote:

> Hi Jim,
>
> If the user releases their reference, the MPI library will need to add
> this handle to some internal data structure. IIRC, never requiring MPI to
> do this was a design guideline for MPI 3.0.
>
>
> This is, I guess, the design choice that supports the current prohibition
> in the RMA chapter, i.e. calling MPI_REQUEST_FREE for a request-based RMA
> operation is erroneous. It’s a small overhead, but there is no trade-off
> (AFAIK) that could mitigate/outweigh it.
>
> Freeing an active request seems like it would leak application memory. For
> example, if you free an active send/recv request, how can the user safely
> access the send/recv buffer?
>
>
> This is the reason that freeing an active point-to-point request is
> discouraged in the MPI Standard (and should, IMHO, be prohibited).
> “It is preferable, in general, to free requests when they are inactive.”
> §3.9
>
> Arguments like “but I can discover remote completion of the operation” do
> not provide a guarantee of local completion and/or freeing of local
> resources. That issue is mentioned in the MPI Standard to justify the
> discouragement, but it could equally well justify a strict prohibition.
> “Active receive requests should not be freed. Otherwise, it will not be
> possible to check that the receive has completed.” §3.9
>
> The MPI Forum is unlikely to vote for upgrading the discouragement to a
> prohibition for point-to-point (because back-compat, sigh).
>
> Is it effectively leaked (i.e. never returned back to the user by the MPI
> library)?
>
>
> It is effectively leaked until MPI_FINALIZE returns.
>
> And how will the user meet the no-pending-communication requirement of
> MPI_Finalize?
>
>
> The text on MPI_FINALIZE does not mandate “no pending communication”, it
> requires “all MPI calls needed to complete its involvement …"
> "Before an MPI process invokes MPI_FINALIZE, the process must perform
> all MPI calls needed to complete its involvement in MPI communications
> associated with the World Model. It must locally complete
> all MPI operations that it initiated and must execute matching calls needed
> to complete MPI communications initiated by other processes. For example,
> if the process executed a nonblocking send, it must eventually
> call MPI_WAIT, MPI_TEST, MPI_REQUEST_FREE, or any derived function” §10.2.2
> in MPI-4.0
>
> The "execute matching calls needed to complete MPI communications
> initiated by other processes” bit is easy - just initiate (meaning
> MPI_Isend/MPI_Irecv or MPI_START) the matching point-to-point MPI procedure
> at the other MPI process. The progress rule in §3.5 guarantees that “If a
> pair of matching send and receives have been initiated then at least one of
> these two operations will complete, independently of other actions in the
> system” and “[each] will complete, unless the [other] is satisfied by
> another message.” So, in a correct MPI program, where all sends have a
> matching receive and vice versa, all those point-to-point communication
> operations will complete (eventually, possibly during MPI_FINALIZE).
>
> The “locally complete” bit is what you’re really asking about. Of course,
> strictly, MPI_REQUEST_FREE does not “locally complete” and so it should not
> be relevant in this pre-finalise instruction; it is listed here precisely
> because of the historical exception permitting freeing of active
> point-to-point requests. Thus, “MPI_ISEND, MPI_REQUEST_FREE, MPI_FINALIZE”
> is an explicitly allowed exception, even though it would otherwise breach
> the rule.
>
> Cheers,
> Dan.
>> Dr Daniel Holmes PhD
> Architect (HPC Research)
> d.holmes at epcc.ed.ac.uk
> Phone: +44 (0) 131 651 3465
> Mobile: +44 (0) 7940 524 088
> Address: Room 2.09, Bayes Centre, 47 Potterrow, Central Area, Edinburgh,
> EH8 9BT
>> The University of Edinburgh is a charitable body, registered in Scotland,
> with registration number SC005336.
>>
> On 13 Aug 2020, at 20:17, Jim Dinan <james.dinan at gmail.com> wrote:
>
> Sorry, I got my wires crossed there. Apply what I wrote to
> MPI_Request_free on an active request.
>
> Assume that the MPI library allocates space on the heap for the internal
> request object and returns a handle (e.g. pointer) through the MPI_Request
> object. The user is required to hang onto this handle and wait/test on it
> in the future, so MPI doesn't need to hold a reference. If the user
> releases their reference, the MPI library will need to add this handle do
> some internal data structure. IIRC, never requiring MPI do this was a
> design guideline for MPI 3.0.
>
> But also, freeing an active request seems like it would leak application
> memory. For example, if you free an active send/recv request, how can the
> user safely access the send/recv buffer? Is it effectively leaked (i.e.
> never returned back to the user by the MPI library)? And how will the user
> meet the no-pending-communication requirement of MPI_Finalize?
>
>  ~Jim.
>
> On Thu, Aug 13, 2020 at 10:07 AM HOLMES Daniel <d.holmes at epcc.ed.ac.uk>
> wrote:
>
>> Hi Jim,
>>
>> To be clear, I think that MPI_CANCEL is evil and should be removed from
>> the MPI Standard entirely at the earliest convenience.
>>
>> I am certainly not arguing that it be permitted for more MPI operations.
>>
>> I thought the discussion was focused on MPI_REQUEST_FREE and whether or
>> not it can/should be used on an active request.
>>
>> If a particular MPI implementation does not keep a reference to the
>> request between MPI_RPUT and MPI_REQUEST_FREE, but needs that reference to
>> process the completion event, then that MPI implementation would be
>> required to keep a reference to the request from MPI_REQUEST_FREE until
>> that important task had been done, perhaps until the close epoch call. This
>> requires no new memory because the user is giving up their reference to the
>> request, so MPI can safely use the request it is passed in MPI_REQUEST_FREE
>> without copying it. As you say, MPI takes over the responsibility for
>> processing the completion event.
>>
>> Your question about why the implementation should be required to take on
>> this complexity is a good one. That, I guess, is why freeing any active
>> request is a bad idea. MPI is required to differentiate completion of
>> individual operations (so it can implement MPI_WAIT) but that means
>> something must process completion at some point for each individual
>> operation. In RMA, that responsibility can be discharged earlier than in
>> other parts of the MPI interface, but the real question is “why should MPI
>> offer to take on this responsibility in the first place?”
>>
>> Thanks, that helps (me at least).
>>
>> Cheers,
>> Dan.
>>>> Dr Daniel Holmes PhD
>> Architect (HPC Research)
>> d.holmes at epcc.ed.ac.uk
>> Phone: +44 (0) 131 651 3465
>> Mobile: +44 (0) 7940 524 088
>> Address: Room 2.09, Bayes Centre, 47 Potterrow, Central Area, Edinburgh,
>> EH8 9BT
>>>> The University of Edinburgh is a charitable body, registered in Scotland,
>> with registration number SC005336.
>>>>
>> On 13 Aug 2020, at 14:43, Jim Dinan <james.dinan at gmail.com> wrote:
>>
>> The two cases you mentioned would have the same behavior at an
>> application level. However, there may be important differences in the
>> implementation of each operation. For example, an MPI_Put operation may be
>> configured to not generate a completion event, whereas an MPI_Rput would.
>> The library may be relying on the user to make a call on the request to
>> process the event and clean up resources. The implementation can take over
>> this responsibility if the user cancels the request, but why should we
>> ask implementers to take on this complexity and overhead?
>>
>> My $0.02 is that MPI_Cancel is subtle and complicated, and we should be
>> very careful about where we allow it. I don't see the benefit to the
>> programming model outweighing the complexity and overhead in the MPI
>> runtime for the case of MPI_Rput. I also don't know that we were careful
>> enough in specifying the RMA memory model that a canceled request-based RMA
>> operation will still have well-defined behavior. My understanding is that
>> MPI_Cancel is required primarily for canceling receive requests to meet
>> MPI's quiescent shutdown requirement.
>>
>>  ~Jim.
>>
>> On Thu, Aug 13, 2020 at 8:11 AM HOLMES Daniel via mpi-forum <
>> mpi-forum at lists.mpi-forum.org> wrote:
>>
>>> Hi all,
>>>
>>> To increase my own understanding of RMA, what is the difference (if any)
>>> between a request-based RMA operation where the request is freed without
>>> being completed and before the epoch is closed and a “normal” RMA operation?
>>>
>>> MPI_LOCK() ! or any other "open epoch at origin" procedure call
>>> doUserWorkBefore()
>>> MPI_RPUT(&req)
>>> MPI_REQUEST_FREE(&req)
>>> doUserWorkAfter()
>>> MPI_UNLOCK() ! or the matching “close epoch at origin" procedure call
>>>
>>> vs:
>>>
>>> MPI_LOCK() ! or any other "open epoch at origin" procedure call
>>> doUserWorkBefore()
>>> MPI_PUT()
>>> doUserWorkAfter()
>>> MPI_UNLOCK() ! or the matching “close epoch at origin" procedure call
>>>
>>> Is this a source-to-source translation that is always safe in either
>>> direction?
>>>
>>> In RMA, in contrast to the rest of MPI, there are two opportunities for
>>> MPI to “block” and do non-local work to complete an RMA operation: 1)
>>> during MPI_WAIT for the request (if any - the user may not be given a
>>> request or the user may choose to free the request without calling MPI_WAIT
>>> or the user might call nonblocking MPI_TEST) and 2) during the close epoch
>>> procedure, which is always permitted to be sufficiently non-local to
>>> guarantee that the RMA operation is complete and its freeing stage has been
>>> done. It seems that a request-based RMA operation becomes identical to a
>>> “normal” RMA operation if the user calls MPI_REQUEST_FREE on the request.
>>> This is like “freeing" the request from a nonblocking point-to-point
>>> operation but without the guarantee of a later synchronisation procedure
>>> that can actually complete the operation and actually do the freeing stage
>>> of the operation.
>>>
>>> In collectives, there is no “ensure all operations so far are now done”
>>> procedure call because there is no concept of epoch for collectives.
>>> In point-to-point, there is no “ensure all operations so far are now
>>> done” procedure call because there is no concept of epoch
>>> for point-to-point.
>>> In file operations, there is no “ensure all operations so far are now
>>> done” procedure call because there is no concept of epoch for file
>>> operations. (There is MPI_FILE_SYNC but it is optional so MPI cannot rely
>>> on it being called.)
>>> In these cases, the only non-local procedure that is guaranteed to
>>> happen is MPI_FINALIZE, hence all outstanding non-local work needed by the
>>> “freed” operation might be delayed until that procedure is called.
>>>
>>> The issue with copying parameters is also moot because all of them are
>>> passed-by-value (implicitly copied) or are data-buffers and covered by
>>> “conflicting accesses” RMA rules.
>>>
>>> Thus, to me it seems to me that RMA is a very special case - it could
>>> support different semantics, but that does not provide a good basis for
>>> claiming that the rest of the MPI Standard can support those different
>>> semantics - unless we introduce an epoch concept into the rest of the MPI
>>> Standard. This is not unreasonable: the notifications in GASPI, for
>>> example, guarantee completion of not just the operation they are attached
>>> to but *all* operations issued in the “queue” they represent since the last
>>> notification. Their queue concept serves the purpose of an epoch. I’m sure
>>> there are other examples in other APIs. It seems to me likely that the
>>> proposal for MPI_PSYNC for partitioned communication operations is moving
>>> in the direction of an epoch, although limited to remote completion of all
>>> the partitions in a single operation, which accidentally guarantees that
>>> the operation can be freed locally using a local procedure.
>>>
>>> Cheers,
>>> Dan.
>>>>>> Dr Daniel Holmes PhD
>>> Architect (HPC Research)
>>> d.holmes at epcc.ed.ac.uk
>>> Phone: +44 (0) 131 651 3465
>>> Mobile: +44 (0) 7940 524 088
>>> Address: Room 2.09, Bayes Centre, 47 Potterrow, Central Area, Edinburgh,
>>> EH8 9BT
>>>>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>>>>>>
>>> On 13 Aug 2020, at 01:40, Skjellum, Anthony via mpi-forum <
>>> mpi-forum at lists.mpi-forum.org> wrote:
>>>
>>> FYI, one argument (also used to force us to add restrictions on MPI
>>> persistent collective initialization to be blocking)... The
>>> MPI_Request_free on an NBC poses a problem for the cases where there are
>>> array types
>>> posed (e.g., Alltoallv/w)... It will not be knowable to the application
>>> if the vectors are in use by MPI still after
>>> the  free on an active request.  We do *not* mandate that the MPI
>>> implementation copy such arrays currently, so they are effectively "held as
>>> unfreeable" by the MPI implementation till MPI_Finalize.  The user
>>> cannot deallocate them in a correct program till after MPI_Finalize.
>>>
>>> Another effect for NBC of releasing an active request, IMHO,  is that
>>> you don't know when send buffers are free to be deallocated or receive
>>> buffers are free to be deallocated... since you don't know when the
>>> transfer is complete OR the buffers are no longer used by MPI (till after
>>> MPI_Finalize).
>>>
>>> Tony
>>>
>>>
>>>
>>>
>>> Anthony Skjellum, PhD
>>> Professor of Computer Science and Chair of Excellence
>>> Director, SimCenter
>>> University of Tennessee at Chattanooga (UTC)
>>> tony-skjellum at utc.edu  [or skjellum at gmail.com]
>>> cell: 205-807-4968
>>>
>>> ------------------------------
>>> *From:* mpi-forum <mpi-forum-bounces at lists.mpi-forum.org> on behalf of
>>> Jeff Hammond via mpi-forum <mpi-forum at lists.mpi-forum.org>
>>> *Sent:* Saturday, August 8, 2020 12:07 PM
>>> *To:* Main MPI Forum mailing list <mpi-forum at lists.mpi-forum.org>
>>> *Cc:* Jeff Hammond <jeff.science at gmail.com>
>>> *Subject:* Re: [Mpi-forum] MPI_Request_free restrictions
>>>
>>> We should fix the RMA chapter with an erratum. I care less about NBC but
>>> share your ignorance of why it was done that way.
>>>
>>> Sent from my iPhone
>>>
>>> On Aug 8, 2020, at 6:51 AM, Balaji, Pavan via mpi-forum <
>>> mpi-forum at lists.mpi-forum.org> wrote:
>>>
>>>  Folks,
>>>
>>> Does someone remember why we disallowed users from calling
>>> MPI_Request_free on nonblocking collective requests?  I remember the
>>> reasoning for not allowing cancel (i.e., the operation might have completed
>>> on some processes, but not all), but not for Request_free.  AFAICT,
>>> allowing the users to free the request doesn’t make any difference to the
>>> MPI library.  The MPI library would simply maintain its own refcount to the
>>> request and continue forward till the operation completes.  One of our
>>> users would like to free NBC requests so they don’t have to wait for the
>>> operation to complete in some situations.
>>>
>>> Unfortunately, when I added the Rput/Rget operations in the RMA chapter,
>>> I copy-pasted that text into RMA as well without thinking too hard about
>>> it.  My bad!  Either the RMA committee missed it too, or they thought of a
>>> reason that I can’t think of now.
>>>
>>> Can someone clarify or remind me what the reason was?
>>>
>>> Regards,
>>>
>>>   — Pavan
>>>
>>> MPI-3.1 standard, page 197, lines 26-27:
>>>
>>> “It is erroneous to call MPI_REQUEST_FREE or MPI_CANCEL for a request
>>> associated with a nonblocking collective operation.”
>>>
>>> _______________________________________________
>>> mpi-forum mailing list
>>> mpi-forum at lists.mpi-forum.org
>>> https://lists.mpi-forum.org/mailman/listinfo/mpi-forum
>>>
>>> _______________________________________________
>>> mpi-forum mailing list
>>> mpi-forum at lists.mpi-forum.org
>>> https://lists.mpi-forum.org/mailman/listinfo/mpi-forum
>>>
>>>
>>> _______________________________________________
>>> mpi-forum mailing list
>>> mpi-forum at lists.mpi-forum.org
>>> https://lists.mpi-forum.org/mailman/listinfo/mpi-forum
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpi-forum/attachments/20200814/e7a9fc92/attachment-0001.html>


More information about the mpi-forum mailing list