[Mpi3-ft] Ticket 323 - status?

Ralph Castain rhc at open-mpi.org
Wed May 30 21:18:18 CDT 2012

Obviously, I can't speak for the folks who attended and voted "no", either
directly or by abstaining. However, I have talked to at least a few people,
and can offer a point or two about the concerns.

First, the last study I saw published on the subject of FT for MPI showed a
very low level of interest in FT within the MPI community. It based this on
a usage analysis that showed something over 90% of clusters being too small
to see large failure rates. On the clusters that were large enough
(primarily at the national labs, who pretty clearly voted no), over 80% of
the MPI jobs lasted less than 1 hour.

So the size of the community that potentially benefits from FT is very
small. In contrast, despite assurances it would be turned off unless
specifically requested, it was clear from the proposals that FT would
impact a significant fraction of the code, thus raising the potential for a
substantial round of debugging and instability.

For that majority who would see little-to-no benefit, this isn't an
attractive trade-off.

Second, those who possibly could benefit tend to take a more holistic view
of FT. If you step back and look at the cluster as a system, then there are
multiple ways of addressing the problems of failure during long runs. Yes,
one way is to harden MPI to such events, but that is probably the hardest

One easier way, and the one being largely touted at the moment, is to make
checkpointing of an application be a relatively low-cost event so that it
can be frequently done. This is being commercialized as we speak by the
addition of SSDs to the usual parallel file system, thus making a
checkpoint run at very fast speeds. In fact, "burst" buffers are allowing
the checkpoint to dump very quickly, and then slowly drain to disk,
rendering the checkpoint operation very low cost. Given that the commercial
interests coincide with the HPC interests, this solution is likely to be
available from cluster suppliers very soon at an attractive price.

Combined with measures to make restart very fast as well, this looks like
an alternative that has no performance impact on the application at the MPI
level, doesn't potentially destabilize the software, and may meet the
majority of needs.

I'm not touting this approach over any other, mind you - just trying to
point out that the research interests of the FT/MPI group needs to be
considered in a separate light from the production interests of the
community. What you may be experiencing (from my limited survey) is a
reflection of that divergence.


On Wed, May 30, 2012 at 6:55 PM, George Bosilca <bosilca at eecs.utk.edu>wrote:

> On May 31, 2012, at 08:44 , Martin Schulz wrote:
> Several people who abstained had very similar concerns, but chose the
> abstain vote since they know it meant no,
> Your interpretation is barely making a "simple majority" in the forum, as
> highlighted by parallel discussions in the other email threads. But let's
> leave this discussion in its own thread.
> But you're right, both "no" and "abstain" votes should be addressed. Bill
> made his point very clear, and to be honest he was the only one that raised
> a __valid__ point about the FT proposal. Personally, I am looking forward
> to fruitful discussions during our weekly phone-calls where the complaints
> raised during the voting will be brought forward in the way that the
> working group will have a real opportunity to address them as they deserve.
> In other terms we are all counting on you guys to enlighten us on the major
> issues with this proposal and the potential solutions you envision or
> promote.
>   george.
> On May 31, 2012, at 08:44 , Martin Schulz wrote:
> Hi George,
> One other no was Intel as far as I remember, but I don't remember the 5th.
> However, I would suggest not to focus on the no votes alone. Several people
> who abstained had very similar concerns, but chose the abstain vote since
> they know it meant no, but they agreed with the general necessity of FT for
> MPI. I remember, for example, Bill saying that for him abstain meant no,
> but that changes later on could change his mind. Based on this
> interpretation, the ticket definitely had more than 5 no votes.
> Martin
> On May 31, 2012, at 8:34 AM, Darius Buntinas wrote:
> Argonne was not convinced that we (FTWG) had the right solution, and the
> large changes in the text mentioned previously did not instill confidence.
>  So it was decided that Argonne would vote against the ticket.
> -d
> On May 30, 2012, at 6:24 PM, George Bosilca wrote:
> In total there were 5 no votes. I wonder who were the other two, they
> might be willing to enlighten us on their reasons to vote against.
> george.
> On May 31, 2012, at 05:48 , Anthony Skjellum wrote:
> Three no votes were LLNL, Argonne, and Sandia.  Since MPI is heavily
> driven by DOE, convincing these folks would be important.
> Tony Skjellum, tonyskj at yahoo.com or skjellum at gmail.com
> Cell 205-807-4968
> On May 31, 2012, at 5:10 AM, Richard Graham <richardg at mellanox.com> wrote:
> The main objection raised is that the text has still been having large
> changes, and if not for the pressure of the 3.0 deadline, this would not
> have come up for a vote.  I talked one-on-one with many that either voted
> against or abstained, and this was the major (not only) point raised.
> Rich
> -----Original Message-----
> From: mpi3-ft-bounces at lists.mpi-forum.org [mailto:
> mpi3-ft-bounces at lists.mpi-forum.org] On Behalf Of Aurélien Bouteiller
> Sent: Wednesday, May 30, 2012 10:05 PM
> To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
> Subject: Re: [Mpi3-ft] Ticket 323 - status?
> It seems we had very little, if any, technical opposition on the content
> of the proposal itself, but mostly comments on the process. I think we need
> to understand more what are the oppositions. Do we have a list of who voted
> for and against and their rationale?
> Aurelien
> Le 30 mai 2012 à 08:52, Josh Hursey a écrit :
> That is unfortunate. A close vote (7 yes to 9 no/abstain). :/
> Thanks,
> Josh
> On Wed, May 30, 2012 at 8:38 AM, Thomas Herault
> <herault.thomas at gmail.com> wrote:
> Le 30 mai 2012 a 01:44, George Bosilca a écrit:
> The ticket has been voted down. Come back in 6 months, maybe 3.1. The
> votes were 7 yes, 4 abstains and 5 no.
> Thomas
> Le 30 mai 2012 à 07:02, Josh Hursey a écrit :
> How did the vote go for the fault tolerance ticket 323?
> -- Josh
> --
> Joshua Hursey
> Postdoctoral Research Associate
> Oak Ridge National Laboratory
> http://users.nccs.gov/~jjhursey
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
> --
> Joshua Hursey
> Postdoctoral Research Associate
> Oak Ridge National Laboratory
> http://users.nccs.gov/~jjhursey
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
> --
> * Dr. Aurélien Bouteiller
> * Researcher at Innovative Computing Laboratory
> * University of Tennessee
> * 1122 Volunteer Boulevard, suite 350
> * Knoxville, TN 37996
> * 865 974 9375
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
> ________________________________________________________________________
> Martin Schulz, schulzm at llnl.gov, http://people.llnl.gov/schulzm
> CASC @ Lawrence Livermore National Laboratory, Livermore, USA
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20120530/8bd7328d/attachment-0001.html>

More information about the mpiwg-ft mailing list