[Mpi3-ft] Fault Tolerance Query Interface
terry.dontje at oracle.com
Fri Mar 16 05:52:41 CDT 2012
I vote for the attribute option.
On 3/14/2012 11:44 AM, Darius Buntinas wrote:
> I vote for the attribute option. I don't know what using the MPI_T interface would add, but I guess we could do both. I don't like the explicit query function.
> On Mar 14, 2012, at 9:19 AM, Josh Hursey wrote:
>> During the MPI Forum meeting it was requested that the FT WG add the
>> ability to query the implementation to determine if it supports the
>> functionality described in the new proposal. Such a query interface
>> would allow the user to determine whether they should use a code path
>> including fault recovery techniques, or use an alternative path that
>> does not include such error checking.
>> Note that both execution paths must be supported by the MPI
>> implementation, but if the implementation will never return the error
>> codes defined in that chapter (and makes 'MPI_Comm_shrink' a
>> 'MPI_Comm_dup' and 'MPI_Comm_invalidate' a noop) then the user is
>> doing extra work that is not necessary for that implementation.
>> Further, if the implementation does not support the functionality,
>> this would provide the user with an early sanity check and allow them
>> to bail out before wasting time on the machine.
>> At bottom are a few suggestions for how to provide this functionality.
>> This would be a ticket targeted at 3.1. It is not necessary
>> functionality for the current set of tickets (e.g., 323), just a user
>> convenience interface.
>> The MPI Predefined Attribute option sounds the best to me, though the
>> MPI_T interface extension is interesting as well. What do others
>> -- Josh
>> MPI Defined Attribute:
>> Section 8.1.2 of the MPI 2.2 standard defined a small set of
>> attributes that are defined for MPI_COMM_WORLD to "describe the
>> execution environment." We could add a new attribute:
>> - MPI_SUPPORT_PROC_FAILURE : Boolean variable that indicates whether
>> the implementation is able to provide support for the behavior
>> specified in Chapter 17.
>> The MPI implementation would have to define this 'key' so users can
>> portably query it. The 'value' should be set to 'true' if the
>> functionality is support, and all other values indicate that the
>> implementation does not support the functionality.
>> Explicit MPI Function(s):
>> A general query interface to determine if the functionality in Chapter
>> 17 is supported.
>> We could also explore a per-communication object interface to allow
>> for future implementation flexibility (though it would be more
>> difficult for the user to program against).
>> MPI_COMM_FT_QUERY(MPI_Comm comm, bool&supported);
>> We could also have an an initialization function, similar to threads:
>> MPI_INIT_FT(required, provided);
>> MPI_T interface:
>> Extend the interface, as appropriate and in coordination with the
>> tools group, to allow the user to query the implementation to
>> determine support for various error codes. This would possibly allow
>> us to extend beyond the error codes defined in Chapter 17.
>> So that users can query for 'how well supported is MPI_ERR_X' or 'what
>> is the state of the implementation after returning MPI_ERR_X'. For
>> example, returning MPI_ERR_ARG is not critical in most implementation
>> configurations (but in some it may be). So this would allow the user
>> to ask the MPI implementation if it can continue using MPI after an
>> error, or what it can do after the error is returned.
>> This thread might also be interesting to consider for this point:
>> Joshua Hursey
>> Postdoctoral Research Associate
>> Oak Ridge National Laboratory
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.dontje at oracle.com <mailto:terry.dontje at oracle.com>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the mpiwg-ft