[Mpi3-ft] An alternative FT proposal

George Bosilca bosilca at eecs.utk.edu
Mon Feb 6 19:31:44 CST 2012


FT working group,

As announced, we have been working on an alternative FT proposal. The leading idea of this proposal is to relieve the burden of consistency from the MPI implementations, while providing the means for the user to regain control. As suggested in the mailing list a few days ago, this proposal tries to minimize the semantic changes and additional functions. We believe the proposed set of functions is minimal, yet sufficient to implement stronger consistency models, such as the current working group proposal.

The proposal is not yet complete, we are still working on intercoms, RMA and file operations, but we wanted to submit it for feedback and discussion on the call this week.

  The UTK team.

PS: The abstract and the proposal are attached below.

Abstract:
In this document we propose a flexible approach providing fail-stop process fault tolerance by allowing the application to react to failures while maintaining a minimal execution path in failure-free executions. Our proposal focuses on returning control to the application by avoiding deadlocks due to failures within the MPI library. No implicit, asynchronous error notification is required. Instead, functions are provided to allow processes to invalidate any communication object, thus preventing any process from waiting indefinitely on calls involving the invalidated objects. We consider the proposed set of functions to constitute a minimal basis which allows libraries and applications to increase the fault tolerance capabilities by supporting additional types of failures, and to build other desired strategies and consistency models to tolerate faults.

  
-------------- next part --------------
A non-text attachment was scrubbed...
Name: utk_ft_proposal.pdf
Type: application/pdf
Size: 130059 bytes
Desc: not available
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20120206/69c1f723/attachment.pdf>
-------------- next part --------------


George Bosilca
Research Assistant Professor
Innovative Computing Laboratory
Department of Electrical Engineering and Computer Science
University of Tennessee, Knoxville
http://web.eecs.utk.edu/~bosilca/



More information about the mpiwg-ft mailing list