[Mpi3-ft] Distinguishing errors from failures

Aurélien Bouteiller bouteill at icl.utk.edu
Tue Jul 16 16:41:05 CDT 2013


There is no specific cost to enable FT at all time, so this might not be such a crucial issue actually, since the initial assumption is not true. 

You can point to the following paper that investigates the cost sustained by applications and stressful micro benchmarks to support this claim with solid evidence. 

Bland, W., Bouteiller, A., Herault, T., Hursey, J., Bosilca, G., Dongarra, J.J. “An Evaluation of User-Level Failure Mitigation Support in MPI,“ Computing, Springer, 2013, issn 0010-4885X, http://dx.doi.org/10.1007/s00607-013-0331-3

The intuitive explanation to the no-impact result is that the spec does not change the behavior of any existing MPI functions. If a function has local completion, it retains local completion even when it reports a failure. It merely adds a new class of errors to let users know that what killed them is process failure, rather than say network retransmit error. 

The FT additions kick in only -after- a failure was reported. Nothing is implicit. The MPI implementation is not expected to "fix" things on its own in the background, normal MPI functions are not overloaded with failure related actives. All recovery actions are triggered by explicit calls from the user code, which removes all potential performance problems from "spilling" recovery activity inside failure-free path. 

All this means that the codebase (see the prototype implementation) stays basically unchanged, with no modifications in the transport layer, no modifications in the collective framework, etc. 


As you noted, it can also be turned on/off by mpirun switches if deemed necessary. It is standard compliant (and this is deliberate) to have all FT recovery routines map to no-op (revoke=no-op) or no-FT equivalents (agree=allreduce) when no fault tolerance is needed, and even FT codes will run fine on such an MPI library (as long as there are no failures, of course). If some implementation really wants to "optimize out" anything FT related (or even load-in only if required), This would be our recommendation. But again, the performance gain is expected to be minor, if even measurable.


Le 16 juil. 2013 à 16:23, Jim Dinan <james.dinan at gmail.com> a écrit :

> Hi FT WG,
> I am doing my best to socialize the FT proposal at Intel and gathered a piece of feedback to bring back to the WG.
> There was a concern that any time the user registers an error handler, fault tolerance could be "switched on" because MPI_Comm_set_errhandler() does not distinguish between error classes.  The assumption was that, when switched on, there would be space/time costs associated with fault tolerance.  How does the current proposal determine when fault tolerance should be enabled?
> One suggested mechanism was to add a function, MPI_Comm_set_faulthandler() that allows the programmer to distinguish between errors and failures.  This would allow the runtime to determine when fault tolerance was desired.  I think the way this is handled currently is to rely on the implementation switching on/off fault tolerance when the job is launched.
>  ~Jim.
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft

* Dr. Aurélien Bouteiller
* Researcher at Innovative Computing Laboratory
* University of Tennessee
* 1122 Volunteer Boulevard, suite 309b
* Knoxville, TN 37996
* 865 974 9375

More information about the mpiwg-ft mailing list