[Mpi-22] [Mpi-forum] MPI 2.2 proposal: resolving MPI_Request_free issues

Tue Jul 15 14:08:35 CDT 2008

See inline

-----Original Message-----
From: mpi-22-bounces_at_[hidden] [mailto:mpi-22-bounces_at_[hidden]] On Behalf Of Hubert Ritzdorf
Sent: Tuesday, July 15, 2008 3:11 AM
To: MPI 2.2
Subject: Re: [Mpi-22] [Mpi-forum] MPI 2.2 proposal: resolving MPI_Request_free issues

Jeff Squyres wrote:
> On Jul 14, 2008, at 5:50 PM, Erez Haba wrote:
>
>> Issue #1:
>> Advice to user quote:
>>
>> "Once a request is freed by a call to MPI_REQUEST_FREE, it is not
>> possible to check for the successful completion of the associated
>> communication with calls to MPI_WAIT or MPI_TEST. Also, if an error
>> occurs subsequently during the communication, an error code cannot be
>> returned to the user - such an error must be treated as fatal."
>>
>> This is the only place in the MPI standard that mandates an error to
>> be FATAL, regardless of the user settings. This is truly
>> unrecoverable because the user can not associate the error with the
>> failed send and cannot recover after MPI_Request_free was called.
>> This poses a problem for a fault-tolerance implementation as it must
>> handle this failure without the ability to notify the user for the
>> specific error for the lack of context.
>
> I'm not sure I agree with this premise.  If you need this
> functionality, then you shouldn't be using MPI_REQUEST_FREE.
> Logically speaking, if you want to associate a specific error with a
> specific communication request, then you must have something to tie
> the error *to*.  In this case, it's a request -- but the application
> has explicitly stated that it no longer cares about the request.
>
> Therefore: if you care about the request, don't free it.
>

[erezh] The points I am making are.
1. This is the only location in the standard where it mandates the error to be FATAL
2. using this function creates a problem for fault-tolerance implementations.
(i.e., how would the FT code would know if to make the MPI_Isend FT? the request free can happen any time in the future)

>> Issue #2:
>> Advice to user quote:
>>
>> "Questions arise as to how one knows when the operations have
>> completed when using [snip]
>> causes the send to fail.
>
> I don't quite understand examples 1 and 2 (how would they cause segv's
> in the TCP stack).  It is permissible to (pseudocode):
>
>   while (bytes_to_send > 0) {
>      rc = write(fd, buffer, bytes_to_send);
>      if (rc > 0) {
>         buffer += rc;
>         bytes_to_send -= rc;
>      } else {
>         ...error...
>      }
>   }
>   free(buffer);
>
> regardless of what the receiver does.  I'm not a kernel guy; does
> updating TCP sequence numbers also interact with the payload buffer?

[erezh] it will never happen with your code above; but you are not using async zcopy.
The pattern in windows is to use overlapped send (write) which is still active when the function returns, and is the most efficient way to send your buffer. I know it's possible with Linux but I don't' have the exact pattern.

>
> FWIW: I can see the RMA interconnect example much easier.  You can
> imagine a scenario where a sender successfully sends and the receiver
> successfully receives, but the hardware ACK from the receiver gets
> lost.  The receiver then sends an MPI message back to the sender, but
> the sender is still in the middle of a retransmit timeout (while
> waiting for the hardware ACK that was lost).  In this case, the user
> app may free the buffer too soon, resulting in a segv (or some other
> lion, tiger, or bear) when the sending hardware tries to retransmit.
>

[erezh] Correct; this is the scenario I was describing with the RDMA write.

> Don't get me wrong; I'm not a fan of MPI_REQUEST_FREE either.  :-)
>
>> Proposed Solution
>> 3 proposals from the least restrictive to the most restrictive:
>>
>> Solution #1:
>> Remove the advice to user to reuse the buffer once a reply has
>> arrived. There is no safe way to reuse the buffer (free), overwrite
>> is somewhat safer.
>>
>> Solution #2:
>> Remove the advice to user altogether, disallow the usage pattern of
>> freeing active requests. Only inactive requests are allowed to be
>> freed. (i.e., not started).
>>
>> Solution #3:
>> Deprecate MPI_Request_free. Users can always use MPI_Wait to complete
>> the request.
>>
>> Recommendation:
>> Use solution #2, as users still need to free requests if they are not
>> used; e.g., the app called MPI_Send_init but never got the start that
>> request; hence the request still need to be freed.
>
>
> I'm not an apps guy, but I thought there were real world apps out
> there that use MPI_REQUEST_FREE.  So #2 would break real apps -- but I
> have no idea how many.

[erezh] correct, it possibly can break existing applications as mentioned in the "impact on user" section.

> MPI_Request_free() is used in application programs. For example, it is
> the easiest (and portable) way to send a non-blocking acknowledge
> to a destination process.

[erezh] See Adam Moody proposal for MPI_REQUEST_IGNORE; I think it's safer than the current pattern

>
>      MPI_Isend (buf, 0, MPI_INT, dest, TAG_ACK, comm, lrequest)
>      MPI_Request_free (lrequest)
>
> I know applications which are transferring (long living) buffers to
> destination processes in broker/request parts of application
> and using MPI_Request_free().

[erezh] longed lived buffers is okay (at least they don't free it); how do they know when it's safe to update the send buffer?
(there is an issue of correctness here)

> By the way, I think that Issue 2 is independent on MPI_Request_free().
> "/the arrival of the reply informs the sender that the send has
> completed and the send buffer can be reused"/ is not necessarily related
> to the usage of MPI_Request_free().
> //

[erezh] I agree; however this pattern is suggested only in the MPI_Request_free() advice to users.

>
> Perhaps an alternative solution would be:
>
> Solution #4: Deprecate MPI_REQUEST_FREE so that it can actually be
> removed someday.  State that requests that are freed can never know
> when it is safe to free the corresponding buffer (thus making
> MPI_REQUEST_FREE so unattractive that its use tapers off, and also
> making it perfectly permissible for an MPI implementation to segv if a
> user frees a buffer associated with a REQUEST_FREE'd request --- such
> as the scenarios described above -- because that would be an erroneous
> program :-) ).
>

> MPI_Request_free() is used, provides a required functionality and should
> not be deprecated.
>
> Hubert