[Mpi-22] [Mpi-forum] MPI 2.2 proposal: resolving MPI_Request_free issues
Hubert Ritzdorf
ritzdorf at [hidden]
Tue Jul 15 05:11:28 CDT 2008
Jeff Squyres wrote:
> On Jul 14, 2008, at 5:50 PM, Erez Haba wrote:
>
>> Issue #1:
>> Advice to user quote:
>>
>> Once a request is freed by a call to MPI_REQUEST_FREE, it is not
>> possible to check for the successful completion of the associated
>> communication with calls to MPI_WAIT or MPI_TEST. Also, if an error
>> occurs subsequently during the communication, an error code cannot be
>> returned to the user such an error must be treated as fatal.
>>
>> This is the only place in the MPI standard that mandates an error to
>> be FATAL, regardless of the user settings. This is truly
>> unrecoverable because the user can not associate the error with the
>> failed send and cannot recover after MPI_Request_free was called.
>> This poses a problem for a fault-tolerance implementation as it must
>> handle this failure without the ability to notify the user for the
>> specific error for the lack of context.
>
> I'm not sure I agree with this premise. If you need this
> functionality, then you shouldn't be using MPI_REQUEST_FREE.
> Logically speaking, if you want to associate a specific error with a
> specific communication request, then you must have something to tie
> the error *to*. In this case, it's a request -- but the application
> has explicitly stated that it no longer cares about the request.
>
> Therefore: if you care about the request, don't free it.
>
>> Issue #2:
>> Advice to user quote:
>>
>> Questions arise as to how one knows when the operations have
>> completed when using [snip]
>> causes the send to fail.
>
> I don't quite understand examples 1 and 2 (how would they cause segv's
> in the TCP stack). It is permissible to (pseudocode):
>
> while (bytes_to_send > 0) {
> rc = write(fd, buffer, bytes_to_send);
> if (rc > 0) {
> buffer += rc;
> bytes_to_send -= rc;
> } else {
> ...error...
> }
> }
> free(buffer);
>
> regardless of what the receiver does. I'm not a kernel guy; does
> updating TCP sequence numbers also interact with the payload buffer?
>
> FWIW: I can see the RMA interconnect example much easier. You can
> imagine a scenario where a sender successfully sends and the receiver
> successfully receives, but the hardware ACK from the receiver gets
> lost. The receiver then sends an MPI message back to the sender, but
> the sender is still in the middle of a retransmit timeout (while
> waiting for the hardware ACK that was lost). In this case, the user
> app may free the buffer too soon, resulting in a segv (or some other
> lion, tiger, or bear) when the sending hardware tries to retransmit.
>
> Don't get me wrong; I'm not a fan of MPI_REQUEST_FREE either. :-)
>
>> Proposed Solution
>> 3 proposals from the least restrictive to the most restrictive:
>>
>> Solution #1:
>> Remove the advice to user to reuse the buffer once a reply has
>> arrived. There is no safe way to reuse the buffer (free), overwrite
>> is somewhat safer.
>>
>> Solution #2:
>> Remove the advice to user altogether, disallow the usage pattern of
>> freeing active requests. Only inactive requests are allowed to be
>> freed. (i.e., not started).
>>
>> Solution #3:
>> Deprecate MPI_Request_free. Users can always use MPI_Wait to complete
>> the request.
>>
>> Recommendation:
>> Use solution #2, as users still need to free requests if they are not
>> used; e.g., the app called MPI_Send_init but never got the start that
>> request; hence the request still need to be freed.
>
>
> I'm not an apps guy, but I thought there were real world apps out
> there that use MPI_REQUEST_FREE. So #2 would break real apps -- but I
> have no idea how many.
MPI_Request_free() is used in application programs. For example, it is
the easiest (and portable) way to send a non-blocking acknowledge
to a destination process.
MPI_Isend (buf, 0, MPI_INT, dest, TAG_ACK, comm, lrequest)
MPI_Request_free (lrequest)
I know applications which are transferring (long living) buffers to
destination processes in broker/request parts of application
and using MPI_Request_free().
By the way, I think that Issue 2 is independent on MPI_Request_free().
"/the arrival of the reply informs the sender that the send has
completed and the send buffer can be reused"/ is not necessarily related
to the usage of MPI_Request_free().
//
>
> Perhaps an alternative solution would be:
>
> Solution #4: Deprecate MPI_REQUEST_FREE so that it can actually be
> removed someday. State that requests that are freed can never know
> when it is safe to free the corresponding buffer (thus making
> MPI_REQUEST_FREE so unattractive that its use tapers off, and also
> making it perfectly permissible for an MPI implementation to segv if a
> user frees a buffer associated with a REQUEST_FREE'd request --- such
> as the scenarios described above -- because that would be an erroneous
> program :-) ).
>
MPI_Request_free() is used, provides a required functionality and should
not be deprecated.
Freeing user buffers which are still in use by an interconnect is
another topic. What's about the
receive buffer ? The receive buffer could also already be freed or
re-used in another way
directly after the receive was finished. Does the retransmit to the
receive buffer not cause
corresponding problems ? I think that the receive cannot return before
the transfer
is finished and the user buffers can be reused.
Hubert
*
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3245 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.mpi-forum.org/pipermail/mpi-22/attachments/20080715/3172a7c4/attachment.bin>
More information about the Mpi-22
mailing list