[Mpi3-tools] Tools WG webex: tomorrow!

Tue Jul 2 16:15:21 CDT 2013

David Goodell (dgoodell) wrote on Tue, 2 Jul 2013 at 07:00:16

> On Jul 2, 2013, at 4:54 AM, George Bosilca <bosilca at icl.utk.edu> wrote:
> 
>> On Jul 1, 2013, at 20:37 , David Goodell (dgoodell)
>> <dgoodell at cisco.com> wrote:
>> 
>>> I'm fairly sure that MPICH interprets the standard to mean invoke
>>> comm_delete_attr_fn at actual object destruction time, which may be some
>>> time after MPI_COMM_FREE returns.
>>> 
>>> I believe that the "at destruction time" interpretation is the most useful
>>> one, even if it's not clearly mandated nor universally implemented.

It may be most useful for certain use cases, but the standard does say that the MPI_*_FREE call will fail if the destroy callback returns an error.  

>> I have to disagree as I do believe that once there is no way to access an
>> object, and as a side-effect the attributes attached to it, there is absolutely
>> no reason to delay their release.
> 
> A tool or stacked MPI library may have some outstanding resources
> associated with a communication operation (e.g., MPI_Isend) and/or
> communicator.  The "free time" interpretation requires much more careful
> interception and emulation of MPI semantics than the "destruction time"
> interpretation does.

The destruction time error semantics become very complicated for the MPI implementation.  Despite having all references released, the callback must still be able to use the handle (thus reference count checks cannot be used).  If the callback returns an error, the object must not be destroyed, despite the corresponding FREE call having returned successfully (and potentially changed the user's handle value to MPI_*_NULL).  This causes the object to be leaked with no opportunity for the application to recover.

>> Maybe a even stronger case why the destruction of the attributes should
>> be done when MPI_*_Free is called is the behavior imposed on
>> MPI_COMM_SELF. The standard states at page 363 line 14 "When
>> MPI_FINALIZE is called, it will first execute the equivalent of an
>> MPI_COMM_FREE on MPI_COMM_SELF. Thus will cause the delete callback
>> function to be executed on all keys associated with MPI_COMM_SELF, in the
>> reverse other that they were set on MPI_COMM_SELF. ... The 'freeing' of
>> MPI_COMM_SELF occurs before any other parts of MPI are affected".
> 
> But MPI_Finalize is a special case of communicator destruction, since it's an
> invalid MPI program to have outstanding MPI communication operations at
> finalize time.

There's a gray area here in that MPI_Finalize is defined to behave as if the first thing it did was call MPI_Comm_free on MPI_COMM_SELF (with extra callback ordering semantics).  There is nothing to prevent a callback from checking for pending communication requests and completing them (calling MPI_Wait from the callback), or failing the destroy callback, which given the failure semantics of MPI_COMM_FREE with respect to delete callbacks returning errors should cause MPI_FINALIZE to return an error and leave internal MPI state unchanged.

-Fab