[Mpi3-ft] simplified FT proposal

Josh Hursey jjhursey at open-mpi.org
Tue Jan 17 14:48:23 CST 2012


(A bit of an aside) I completely agree that creating a fault tolerant
application is a tricky endeavor even for the most heroic of developers.
Developing a 'self-stabilizing' application is difficult, and will require
extensive experimentation to derive appropriate algorithms, applications,
and libraries. Some work has happened in this space, but there is a great
need for more research. I worry about specifying something that is too
restrictive to developer (overreaching our responsibilities in a sense),
and thus stifling what researchers can experiment with. The way I have
approached the task of defining a fault tolerant MPI standard is by asking
'how should this specific interface behave when a process fails?' When
there are multiple options then I have tried to gather application
preference and further comment. Then try to weave those specific solutions
into a pattern that can be applied throughout the standard for consistency.
I believe that the majority of the RTS proposal is correct in this regard,
and most of the contention seems to be over specific choices when
multiple solutions are on the table (ANY_SOURCE is a great example). To
that end we have to better articulate the decision process. There are other
features that seem reaching, and we need to assess if those are necessary
or supplementary.

You are correct that the information you receive from a
MPI_Comm_check/validate call is only representative of the known failures
at the time of the call. So it is likely that additional processes failed
just as the validate operation finishes making the data old.

The RTS proposal modified/clarified the semantics of MPI_Comm_split() so
that if the communicator is created then it contains only the alive
processes that called the operation. Of course additional processes may
have failed just after creation, but there is nothing we can do about that.
Is such an operation what you are looking for?

-- Josh

On Sun, Jan 15, 2012 at 5:40 PM, William Gropp <wgropp at illinois.edu> wrote:

> One concern that I have with fault tolerant proposals has to do with races
> in the specification.  This is an area where users often "just want it to
> work" but getting it right is tricky.  In the example here, the
> "alive_group" is really only that at some moment shortly before
> "MPI_Comm_check" returns (and possibly not even that).  After that, it is
> really the "group_of_processes_that_was_alive_at_some_point_in_the_past".
>  Since there are sometimes correlations in failures, this could happen even
> if the initial failure is rare.  An alternate form might be to have a
> routine, collective over a communicator, that returns a new communicator
> meeting some definition of "members were alive at some point during
> construction".  It wouldn't guarantee you could use it, but it would have
> cleaner semantics.
> Bill
> On Jan 13, 2012, at 3:41 PM, Sur, Sayantan wrote:
> I would like to argue for a simplified version of the proposal that covers
> a large percentage of use-cases and resists adding new “features” for the
> full-range of ABFT techniques. It is good if we have a more pragmatic view
> and not sacrifice the entire FT proposal for the 1% fringe cases. Most apps
> just want to do something like this:****
> ** **
> for(… really long time …) {****
>    MPI_Comm_check(work_comm, &is_ok, &alive_group);****
>    if(!is_ok) {****
>        MPI_Comm_create_group(alive_group, …, &new_comm);****
>       // re-balance workload and use new_comm in rest of computation****
>        MPI_Comm_free(work_comm); // get rid of old comm****
>        work_comm = new_comm;****
>    } else {****
>      // continue computation using work_comm****
>      // if some proc failed in this iteration, roll back work done in this
> iteration, go back to loop****
>    }****
> }****
> ** **
> William Gropp
> Director, Parallel Computing Institute
> Deputy Director for Research
> Institute for Advanced Computing Applications and Technologies
> Paul and Cynthia Saylor Professor of Computer Science
> University of Illinois Urbana-Champaign
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft

Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20120117/3e364d11/attachment-0001.html>

More information about the mpiwg-ft mailing list