[Mpi-22] [Mpi-forum] MPI 2.2 proposal: resolving MPI_Request_free issues

Tue Jul 15 05:11:28 CDT 2008

Jeff Squyres wrote:
> On Jul 14, 2008, at 5:50 PM, Erez Haba wrote:
>
>> Issue #1:
>> Advice to user quote:
>>
>> “Once a request is freed by a call to MPI_REQUEST_FREE, it is not 
>> possible to check for the successful completion of the associated 
>> communication with calls to MPI_WAIT or MPI_TEST. Also, if an error 
>> occurs subsequently during the communication, an error code cannot be 
>> returned to the user — such an error must be treated as fatal.”
>>
>> This is the only place in the MPI standard that mandates an error to 
>> be FATAL, regardless of the user settings. This is truly 
>> unrecoverable because the user can not associate the error with the 
>> failed send and cannot recover after MPI_Request_free was called. 
>> This poses a problem for a fault-tolerance implementation as it must 
>> handle this failure without the ability to notify the user for the 
>> specific error for the lack of context.
>
> I'm not sure I agree with this premise.  If you need this 
> functionality, then you shouldn't be using MPI_REQUEST_FREE.  
> Logically speaking, if you want to associate a specific error with a 
> specific communication request, then you must have something to tie 
> the error *to*.  In this case, it's a request -- but the application 
> has explicitly stated that it no longer cares about the request.
>
> Therefore: if you care about the request, don't free it.
>
>> Issue #2:
>> Advice to user quote:
>>
>> “Questions arise as to how one knows when the operations have 
>> completed when using [snip]
>> causes the send to fail.
>
> I don't quite understand examples 1 and 2 (how would they cause segv's 
> in the TCP stack).  It is permissible to (pseudocode):
>
>   while (bytes_to_send > 0) {
>      rc = write(fd, buffer, bytes_to_send);
>      if (rc > 0) {
>         buffer += rc;
>         bytes_to_send -= rc;
>      } else {
>         ...error...
>      }
>   }
>   free(buffer);
>
> regardless of what the receiver does.  I'm not a kernel guy; does 
> updating TCP sequence numbers also interact with the payload buffer?
>
> FWIW: I can see the RMA interconnect example much easier.  You can 
> imagine a scenario where a sender successfully sends and the receiver 
> successfully receives, but the hardware ACK from the receiver gets 
> lost.  The receiver then sends an MPI message back to the sender, but 
> the sender is still in the middle of a retransmit timeout (while 
> waiting for the hardware ACK that was lost).  In this case, the user 
> app may free the buffer too soon, resulting in a segv (or some other 
> lion, tiger, or bear) when the sending hardware tries to retransmit.
>
> Don't get me wrong; I'm not a fan of MPI_REQUEST_FREE either.  :-)
>
>> Proposed Solution
>> 3 proposals from the least restrictive to the most restrictive:
>>
>> Solution #1:
>> Remove the advice to user to reuse the buffer once a reply has 
>> arrived. There is no safe way to reuse the buffer (free), overwrite 
>> is somewhat safer.
>>
>> Solution #2:
>> Remove the advice to user altogether, disallow the usage pattern of 
>> freeing active requests. Only inactive requests are allowed to be 
>> freed. (i.e., not started).
>>
>> Solution #3:
>> Deprecate MPI_Request_free. Users can always use MPI_Wait to complete 
>> the request.
>>
>> Recommendation:
>> Use solution #2, as users still need to free requests if they are not 
>> used; e.g., the app called MPI_Send_init but never got the start that 
>> request; hence the request still need to be freed.
>
>
> I'm not an apps guy, but I thought there were real world apps out 
> there that use MPI_REQUEST_FREE.  So #2 would break real apps -- but I 
> have no idea how many.
MPI_Request_free() is used in application programs. For example, it is 
the easiest (and portable) way to send a non-blocking acknowledge
to a destination process.

     MPI_Isend (buf, 0, MPI_INT, dest, TAG_ACK, comm, lrequest)
     MPI_Request_free (lrequest)

I know applications which are transferring (long living) buffers to 
destination processes in broker/request parts of application
and using MPI_Request_free().

By the way, I think that Issue 2 is independent on MPI_Request_free(). 
"/the arrival of the reply informs the sender that the send has
completed and the send buffer can be reused"/ is not necessarily related 
to the usage of MPI_Request_free().
//
>
> Perhaps an alternative solution would be:
>
> Solution #4: Deprecate MPI_REQUEST_FREE so that it can actually be 
> removed someday.  State that requests that are freed can never know 
> when it is safe to free the corresponding buffer (thus making 
> MPI_REQUEST_FREE so unattractive that its use tapers off, and also 
> making it perfectly permissible for an MPI implementation to segv if a 
> user frees a buffer associated with a REQUEST_FREE'd request --- such 
> as the scenarios described above -- because that would be an erroneous 
> program :-) ).
>
MPI_Request_free() is used, provides a required functionality and should 
not be deprecated.

Freeing user buffers which are still in use by an interconnect is 
another topic. What's about the
receive buffer ? The receive buffer could also already be freed or 
re-used in another way
directly after the receive was finished. Does the retransmit  to the 
receive buffer not cause
corresponding problems ?  I think that the receive cannot return before 
the transfer
is finished and the user buffers can be reused.

Hubert

* 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3245 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.mpi-forum.org/pipermail/mpi-22/attachments/20080715/3172a7c4/attachment.bin>