[Mpi3-ft] Communicator Virtualization as a step forward

Greg Bronevetsky bronevetsky1 at llnl.gov
Fri Feb 13 11:43:13 CST 2009


You make a good point but I don't really know what to do about it. 
Some systems are simply too small for meaningful fault tolerance 
support to make any sense. If the probability of failure is too low 
or the cost of restarting the entire computation is insignificant, it 
would be foolish to give up performance by making your MPI 
implementation properly fault tolerant. However, if you're writing an 
implementation for large-scale systems, the wisest thing would be to 
implement reasonable fault detection and recovery and allow the user 
to turn it off via an implementation-specific command (we cannot 
define support levels that will be meaningful across systems and MPI 
implementations). I think that large-scale MPIs will implement this 
API because users will ask for it (we're talking to users who are 
interested in it) but there is no way to require a certain level of 
support when defining such levels will involve low-level concepts 
that don't exist in the MPI spec.

Greg Bronevetsky
Post-Doctoral Researcher
1028 Building 451
Lawrence Livermore National Lab
(925) 424-5756
bronevetsky1 at llnl.gov

At 09:29 AM 2/13/2009, Supalov, Alexander wrote:
>Thanks. We're talking about different time in the program life cycle.
>
>You're talking about an application that is already prepared to take 
>advantage of FT support.
>
>I'm talking about the incentive to make a non-FT prepared 
>application FT-ready.
>
>My expectation is that unless there's a guarantee that the work 
>necessary to make a non-FT prepared application FT-ready will be 
>supported by enough MPIs providing a reasonable level of FT support, 
>people are unlikely to make that investment.
>
>And if this is true, then this will kill the FT in MPI-3 just as 
>prolonged practical lack of uniform spawning support basically 
>killed that MPI-2 feature, or at least made it practically 
>irrelevant in the application sense.
>
>-----Original Message-----
>From: mpi3-ft-bounces at lists.mpi-forum.org 
>[mailto:mpi3-ft-bounces at lists.mpi-forum.org] On Behalf Of Greg Bronevetsky
>Sent: Friday, February 13, 2009 6:22 PM
>To: MPI 3.0 Fault Tolerance and Dynamic Process Control working 
>Group; MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
>Subject: Re: [Mpi3-ft] Communicator Virtualization as a step forward
>
>I'm not sure I understand your point. If an application is written to
>take advantage of the FT API, it will not need to be modified if it
>is moved from one MPI implementation to another. The only difference
>is that different subsets of physical errors will become
>recoverable/non-recoverable. The subsets in question will depend on
>the actual physical components in the system as well as the
>performance/reliability tradeoffs made by the MPI implementors to
>make the most out of the given system. A high-quality implementation
>will provide some knobs that will allow system administrators to
>tailor this tradeoff to their particular system. The FT API makes it
>possible for implementors to provide support for handling failures
>that is in any way that is richer than just aborting the
>applications. We cannot make constraints on how well they take
>advantage of this new dimension in user support.
>
>I guess the closest analogy to this is that MPI doesn't specify the
>latency and bandwidth of the network but does specify the semantics
>of MPI_Send and MPI_Recv. MPI implementations and system designers
>are free to satisfy these semantics in a way that optimizes
>performance, CPU overhead, scalability and system cost.
>
>Greg Bronevetsky
>Post-Doctoral Researcher
>1028 Building 451
>Lawrence Livermore National Lab
>(925) 424-5756
>bronevetsky1 at llnl.gov
>
>At 09:12 AM 2/13/2009, Supalov, Alexander wrote:
> >Hi,
> >
> >Thanks. It appears to me that this goes a step farther than the
> >spawning/attachment. That one was supposed to work everywhere, but
> >very few implementation were available until very recently, and
> >hence very few applications use this feature.
> >
> >Now, with FT optional in the sense that you described, who's going
> >to develop an application that may be able to take advantage of
> >fairly unspecified capability to detect unknown number of faults? I
> >can't imagine why this would happen.
> >
> >Also, imagine one got used to a certain set of provided FT features.
> >Going to another MPI implementation they may see another set of
> >features, up to and including none at all. Why would they
> >experiment? This locks users into one MPI implementation. No good.
> >
> >So, I'd say that unless we can guarantee some basic functionality
> >when FT is switched on that way or another, we're not going to
> >create much excitement in the user community. And life's too short
> >to work on something that nobody will be excited about.
> >
> >It may be better to propose a way to control FT as a subset or what
> >not, and then do guarantee that if you turn this capability on, you
> >get as a minimum certain benefits by agreeing to certain tradeoff on
> >performance.
> >
> >That may be out of the scope of this particular WG, but this is
> >something that this WG should very seriously consider in order for
> >the outcome to fly. Or so I think.
> >
> >Best regards.
> >
> >Alexander
> >
> >-----Original Message-----
> >From: mpi3-ft-bounces at lists.mpi-forum.org
> >[mailto:mpi3-ft-bounces at lists.mpi-forum.org] On Behalf Of Greg Bronevetsky
> >Sent: Friday, February 13, 2009 5:51 PM
> >To: MPI 3.0 Fault Tolerance and Dynamic Process Control working
> >Group; MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
> >Subject: Re: [Mpi3-ft] Communicator Virtualization as a step forward
> >
> >
> > >FT must be optional to be accepted. One way is to put it into a
> > >subset, say, thru the MPI_INIT_ASSERTED proposal. Another is to add
> > >yet one MPI_INIT call that will contain a flag for FT configuration
> > >(like MPI_FT_SYNCHRONOUS, MPI_FT_ASYNCHRONOUS, etc.). This was
> > >mentioned in relation to the issue notification in the earlier fault
> > >handling=error handling proposal of mine, see https://
> > >svn.mpi-forum.org/trac/mpi-forum-web/wiki/Fault%20Handling.
> >
> >I agree that FT must be optional but I don't think that we need to
> >add anything to the proposal to make this happen. The proposal
> >provides an API that allows the MPI implementation to tell the
> >application about detected but recoverable failures and help it
> >perform recovery. It does not say anything about which failures must
> >be recoverable for MPI. Reliable MPI implementations will be able to
> >do much more than unreliable MPI implementations. Users who need
> >reliability will choose the former while others will choose the
> >latter. The same will apply for things like network degradation.
> >Since the spec will never talk about what types of physical events
> >must be reportable by MPI, individual implementations will be able to
> >trade off efficiency against the usefulness of system monitoring and
> >all such choices will be compliant to the spec.
> >
> >Greg Bronevetsky
> >Post-Doctoral Researcher
> >1028 Building 451
> >Lawrence Livermore National Lab
> >(925) 424-5756
> >bronevetsky1 at llnl.gov
> >
> >_______________________________________________
> >mpi3-ft mailing list
> >mpi3-ft at lists.mpi-forum.org
> >http://  lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
> >---------------------------------------------------------------------
> >Intel GmbH
> >Dornacher Strasse 1
> >85622 Feldkirchen/Muenchen Germany
> >Sitz der Gesellschaft: Feldkirchen bei Muenchen
> >Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> >Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> >VAT Registration No.: DE129385895
> >Citibank Frankfurt (BLZ 502 109 00) 600119052
> >
> >This e-mail and any attachments may contain confidential material for
> >the sole use of the intended recipient(s). Any review or distribution
> >by others is strictly prohibited. If you are not the intended
> >recipient, please contact the sender and delete all copies.
> >
> >
> >_______________________________________________
> >mpi3-ft mailing list
> >mpi3-ft at lists.mpi-forum.org
> >http://  lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>
>_______________________________________________
>mpi3-ft mailing list
>mpi3-ft at lists.mpi-forum.org
>http:// lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>---------------------------------------------------------------------
>Intel GmbH
>Dornacher Strasse 1
>85622 Feldkirchen/Muenchen Germany
>Sitz der Gesellschaft: Feldkirchen bei Muenchen
>Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
>Registergericht: Muenchen HRB 47456 Ust.-IdNr.
>VAT Registration No.: DE129385895
>Citibank Frankfurt (BLZ 502 109 00) 600119052
>
>This e-mail and any attachments may contain confidential material for
>the sole use of the intended recipient(s). Any review or distribution
>by others is strictly prohibited. If you are not the intended
>recipient, please contact the sender and delete all copies.
>
>
>_______________________________________________
>mpi3-ft mailing list
>mpi3-ft at lists.mpi-forum.org
>http:// lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft




More information about the mpiwg-ft mailing list