[Mpi3-ft] ULFM Slides for Madrid

Sur, Sayantan sayantan.sur at intel.com
Fri Aug 16 16:05:24 CDT 2013

Ah, gotcha.


From: mpi3-ft-bounces at lists.mpi-forum.org [mailto:mpi3-ft-bounces at lists.mpi-forum.org] On Behalf Of Wesley Bland
Sent: Friday, August 16, 2013 1:55 PM
To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
Subject: Re: [Mpi3-ft] ULFM Slides for Madrid

I think my slide was unclear. The case I meant was if a process failed before the Allreduce. In that case, the Allreduce would always fail.. If the failure occurs during the algorithm, as you pointed out, it wouldn't necessarily fail everywhere.


On Friday, August 16, 2013 at 3:51 PM, Sur, Sayantan wrote:

Hi Wesley,

Thanks for sending the slides around. Does the assertion on Slide 6 and example on Slide 12 that “Allreduce would always fail” (in the case of failure of one of the participants) hold true?

For example, an MPI implementation might have a terrible implementation of allreduce, where participating ranks send their buffer to a root, which does the reduction. The root then sends the results back to the participants one after the other. One of these p2p sends then fails. In this case, isn’t it possible that one rank gets MPI_ERR_PROC_FAILED, whereas the others get MPI_SUCCESS?



From: mpi3-ft-bounces at lists.mpi-forum.org<mailto:mpi3-ft-bounces at lists.mpi-forum.org> [mailto:mpi3-ft-bounces at lists.mpi-forum.org] On Behalf Of Wesley Bland
Sent: Friday, August 16, 2013 10:17 AM
To: MPI3-FT Working Group
Subject: [Mpi3-ft] ULFM Slides for Madrid

I've put together a first draft of some slides that give an overview of ULFM for the forum meeting in Madrid for Rich to present. I think I captured most of the discussion we had on the last call relating to rationale, but if I missed something, feel free to add that to this deck or send me edits.

I think the plan of action, as I understand it from Rich and Geoffroy, is to iterate on these slides until the next call on Tuesday and then we'll go over them as a group to make sure we're all on the same page. Rich, will you be able to attend the call this week (Tuesday, 3:00 PM EST)? If not, we can adjust it this week to make sure you can be there.

Just to be clear, the goal of this presentation is to provide an overview of ULFM for the European crown that can't usually attend the forum meetings. This will probably be a review for many of the people who attend regularly, but there is some new rationale that we haven't included in the past when we've been putting these presentations together. I'd imagine that this meeting will have some confusion from the attendees where they might remember parts of the previous proposals and mix them, but if we can tell them to do a memory wipe ahead of time, that would help.

Let me know what I've missed.


mpi3-ft mailing list
mpi3-ft at lists.mpi-forum.org<mailto:mpi3-ft at lists.mpi-forum.org>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20130816/46b29527/attachment-0001.html>

More information about the mpiwg-ft mailing list