[Mpi3-ft] Fault Tolerance Query Interface

Darius Buntinas buntinas at mcs.anl.gov
Wed Mar 14 10:44:00 CDT 2012


I vote for the attribute option.  I don't know what using the MPI_T interface would add, but I guess we could do both.  I don't like the explicit query function.

-d


On Mar 14, 2012, at 9:19 AM, Josh Hursey wrote:

> During the MPI Forum meeting it was requested that the FT WG add the
> ability to query the implementation to determine if it supports the
> functionality described in the new proposal. Such a query interface
> would allow the user to determine whether they should use a code path
> including fault recovery techniques, or use an alternative path that
> does not include such error checking.
> 
> Note that both execution paths must be supported by the MPI
> implementation, but if the implementation will never return the error
> codes defined in that chapter (and makes 'MPI_Comm_shrink' a
> 'MPI_Comm_dup' and 'MPI_Comm_invalidate' a noop) then the user is
> doing extra work that is not necessary for that implementation.
> Further, if the implementation does not support the functionality,
> this would provide the user with an early sanity check and allow them
> to bail out before wasting time on the machine.
> 
> At bottom are a few suggestions for how to provide this functionality.
> This would be a ticket targeted at 3.1. It is not necessary
> functionality for the current set of tickets (e.g., 323), just a user
> convenience interface.
> 
> The MPI Predefined Attribute option sounds the best to me, though the
> MPI_T interface extension is interesting as well. What do others
> think?
> 
> -- Josh
> 
> 
> MPI Defined Attribute:
> ----------------------
> Section 8.1.2 of the MPI 2.2 standard defined a small set of
> attributes that are defined for MPI_COMM_WORLD to "describe the
> execution environment." We could add a new attribute:
>  - MPI_SUPPORT_PROC_FAILURE : Boolean variable that indicates whether
> the implementation is able to provide support for the behavior
> specified in Chapter 17.
> 
> The MPI implementation would have to define this 'key' so users can
> portably query it. The 'value' should be set to 'true' if the
> functionality is support, and all other values indicate that the
> implementation does not support the functionality.
> 
> 
> Explicit MPI Function(s):
> -------------------------
> MPI_FT_QUERY(bool &supported);
> A general query interface to determine if the functionality in Chapter
> 17 is supported.
> 
> We could also explore a per-communication object interface to allow
> for future implementation flexibility (though it would be more
> difficult for the user to program against).
> MPI_COMM_FT_QUERY(MPI_Comm comm, bool &supported);
> MPI_WIN_FT_QUERY(...);
> MPI_FILE_FT_QUERY(...);
> 
> We could also have an an initialization function, similar to threads:
> MPI_INIT_FT(required, provided);
> 
> 
> MPI_T interface:
> ----------------
> Extend the interface, as appropriate and in coordination with the
> tools group, to allow the user to query the implementation to
> determine support for various error codes. This would possibly allow
> us to extend beyond the error codes defined in Chapter 17.
> 
> So that users can query for 'how well supported is MPI_ERR_X' or 'what
> is the state of the implementation after returning MPI_ERR_X'. For
> example, returning MPI_ERR_ARG is not critical in most implementation
> configurations (but in some it may be). So this would allow the user
> to ask the MPI implementation if it can continue using MPI after an
> error, or what it can do after the error is returned.
> 
> This thread might also be interesting to consider for this point:
>   http://lists.mpi-forum.org/mpi3-ft/2011/06/0737.php
> 
> 
> 
> --
> Joshua Hursey
> Postdoctoral Research Associate
> Oak Ridge National Laboratory
> http://users.nccs.gov/~jjhursey
> 
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft





More information about the mpiwg-ft mailing list