[Mpi3-ft] Distinguishing errors from failures
james.dinan at gmail.com
Wed Jul 17 11:16:26 CDT 2013
Hi George and Aurilien,
Thanks for the detailed responses. I looked at the paper, and it indicates
that failure detection is needed when ANY_SOURCE is used, and I assume also
when passive target RMA is used, since a process can fail while holding the
lock. Won't this have an impact on performance?
On Tue, Jul 16, 2013 at 7:27 PM, George Bosilca <bosilca at icl.utk.edu> wrote:
> In addition to Aurelien's answer, there is something else that I think
> should be stressed in this context, something I feel the WG failed to make
> clear enough in its interactions with the forum.
> There is one single strong mandate from an MPI implementation in the
> current FT proposal: a revoked communicator is improper for further
> communications. This is about the extent imposed on MPI implementations by
> the current proposal, both in terms of capabilities and overheads. And
> still, there were complaints raised about it …
> Why is this so? Because there is no mandatory error detection and
> propagation, when the MPI library detects a failure only local dispatch of
> this information is necessary. Of course, one would expect a high quality
> MPI implementation to do it's best to ensure a certain level of quality of
> services here, a significantly lesser effort than ensuring some other "high
> quality" type of capabilities (namely progress and fairness). The paper
> mentioned by Aurelien proves that at least under certain assumptions this
> can be achieved with minimal/unnoticeable overhead.
> >From there, the application itself is responsible to make good use of the
> functions provided by the FT proposal to handle the failure in a meaningful
> way for the application (again there is no imposed FT model).
> MPI_Comm_revoke provides a communication channel where at the request of
> the application the knowledge about a process failure is propagated to
> other MPI processes. The ACK function to locally acknowledge the failure.
> The agreement to reach a consensus for building more complex FT methods,
> and finally shrink to take you back to a communicator that has all the
> properties of a sane/workable MPI communicator.
> On Jul 16, 2013, at 23:41 , Aurélien Bouteiller <bouteill at icl.utk.edu>
> > Jim,
> > There is no specific cost to enable FT at all time, so this might not be
> such a crucial issue actually, since the initial assumption is not true.
> > You can point to the following paper that investigates the cost
> sustained by applications and stressful micro benchmarks to support this
> claim with solid evidence.
> > Bland, W., Bouteiller, A., Herault, T., Hursey, J., Bosilca, G.,
> Dongarra, J.J. “An Evaluation of User-Level Failure Mitigation Support in
> MPI,“ Computing, Springer, 2013, issn 0010-4885X,
> > The intuitive explanation to the no-impact result is that the spec does
> not change the behavior of any existing MPI functions. If a function has
> local completion, it retains local completion even when it reports a
> failure. It merely adds a new class of errors to let users know that what
> killed them is process failure, rather than say network retransmit error.
> > The FT additions kick in only -after- a failure was reported. Nothing is
> implicit. The MPI implementation is not expected to "fix" things on its own
> in the background, normal MPI functions are not overloaded with failure
> related actives. All recovery actions are triggered by explicit calls from
> the user code, which removes all potential performance problems from
> "spilling" recovery activity inside failure-free path.
> > All this means that the codebase (see the prototype implementation)
> stays basically unchanged, with no modifications in the transport layer, no
> modifications in the collective framework, etc.
> > ~~~~~~
> > As you noted, it can also be turned on/off by mpirun switches if deemed
> necessary. It is standard compliant (and this is deliberate) to have all FT
> recovery routines map to no-op (revoke=no-op) or no-FT equivalents
> (agree=allreduce) when no fault tolerance is needed, and even FT codes will
> run fine on such an MPI library (as long as there are no failures, of
> course). If some implementation really wants to "optimize out" anything FT
> related (or even load-in only if required), This would be our
> recommendation. But again, the performance gain is expected to be minor, if
> even measurable.
> > Aurelien
> > Le 16 juil. 2013 à 16:23, Jim Dinan <james.dinan at gmail.com> a écrit :
> >> Hi FT WG,
> >> I am doing my best to socialize the FT proposal at Intel and gathered a
> piece of feedback to bring back to the WG.
> >> There was a concern that any time the user registers an error handler,
> fault tolerance could be "switched on" because MPI_Comm_set_errhandler()
> does not distinguish between error classes. The assumption was that, when
> switched on, there would be space/time costs associated with fault
> tolerance. How does the current proposal determine when fault tolerance
> should be enabled?
> >> One suggested mechanism was to add a function,
> MPI_Comm_set_faulthandler() that allows the programmer to distinguish
> between errors and failures. This would allow the runtime to determine
> when fault tolerance was desired. I think the way this is handled
> currently is to rely on the implementation switching on/off fault tolerance
> when the job is launched.
> >> ~Jim.
> >> _______________________________________________
> >> mpi3-ft mailing list
> >> mpi3-ft at lists.mpi-forum.org
> >> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
> > --
> > * Dr. Aurélien Bouteiller
> > * Researcher at Innovative Computing Laboratory
> > * University of Tennessee
> > * 1122 Volunteer Boulevard, suite 309b
> > * Knoxville, TN 37996
> > * 865 974 9375
> > _______________________________________________
> > mpi3-ft mailing list
> > mpi3-ft at lists.mpi-forum.org
> > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the mpiwg-ft