[Mpi3-ft] Communicator Virtualization as a step forward
Supalov, Alexander
alexander.supalov at intel.com
Fri Feb 13 11:55:41 CST 2009
Thanks. I guess there are several ways to deal with this situation:
- Hope that market forces will make all MPIs provide a reasonable level of FT support that will stabilize with the time.
- Make certain promises in the standard and let people claim FT compliance to this level and thus provide certain level of support.
Introduction of FT support is akin to the thread support introduction. In that case the Forum was able to determine several reasonable levels that found acceptance with the users, and now we see that mixed mode programs are starting to appear in substantial numbers.
I would argue that the FT support in MPI-3 should attempt to do something comparable. Providing a variable and unpredictable level of FT support, which is how the initial description came across to me, may not be good enough for people to take the plunge.
In some sense, the discussion on this topic mirrors the discussion on the checkpoint/restart. There I heard arguments that since we cannot define what this may possibly mean down in the MPI, and hence we cannot simply do with the MPI_Prepare_for_checkpoint & MPI_Recover_after_restart calls that would be basically implementation specific (in MPI and checkpointing system sense).
Here we say that we cannot define anything tangible to introduce the FT support levels, but still we are going ahead with introducing FT into the MPI-3, at some unfathomable level, in full hope that life will fix things up somehow.
Do you notice some kind of discord here? I seem to.
-----Original Message-----
From: mpi3-ft-bounces at lists.mpi-forum.org [mailto:mpi3-ft-bounces at lists.mpi-forum.org] On Behalf Of Greg Bronevetsky
Sent: Friday, February 13, 2009 6:43 PM
To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group; MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
Subject: Re: [Mpi3-ft] Communicator Virtualization as a step forward
You make a good point but I don't really know what to do about it.
Some systems are simply too small for meaningful fault tolerance
support to make any sense. If the probability of failure is too low
or the cost of restarting the entire computation is insignificant, it
would be foolish to give up performance by making your MPI
implementation properly fault tolerant. However, if you're writing an
implementation for large-scale systems, the wisest thing would be to
implement reasonable fault detection and recovery and allow the user
to turn it off via an implementation-specific command (we cannot
define support levels that will be meaningful across systems and MPI
implementations). I think that large-scale MPIs will implement this
API because users will ask for it (we're talking to users who are
interested in it) but there is no way to require a certain level of
support when defining such levels will involve low-level concepts
that don't exist in the MPI spec.
Greg Bronevetsky
Post-Doctoral Researcher
1028 Building 451
Lawrence Livermore National Lab
(925) 424-5756
bronevetsky1 at llnl.gov
At 09:29 AM 2/13/2009, Supalov, Alexander wrote:
>Thanks. We're talking about different time in the program life cycle.
>
>You're talking about an application that is already prepared to take
>advantage of FT support.
>
>I'm talking about the incentive to make a non-FT prepared
>application FT-ready.
>
>My expectation is that unless there's a guarantee that the work
>necessary to make a non-FT prepared application FT-ready will be
>supported by enough MPIs providing a reasonable level of FT support,
>people are unlikely to make that investment.
>
>And if this is true, then this will kill the FT in MPI-3 just as
>prolonged practical lack of uniform spawning support basically
>killed that MPI-2 feature, or at least made it practically
>irrelevant in the application sense.
>
>-----Original Message-----
>From: mpi3-ft-bounces at lists.mpi-forum.org
>[mailto:mpi3-ft-bounces at lists.mpi-forum.org] On Behalf Of Greg Bronevetsky
>Sent: Friday, February 13, 2009 6:22 PM
>To: MPI 3.0 Fault Tolerance and Dynamic Process Control working
>Group; MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
>Subject: Re: [Mpi3-ft] Communicator Virtualization as a step forward
>
>I'm not sure I understand your point. If an application is written to
>take advantage of the FT API, it will not need to be modified if it
>is moved from one MPI implementation to another. The only difference
>is that different subsets of physical errors will become
>recoverable/non-recoverable. The subsets in question will depend on
>the actual physical components in the system as well as the
>performance/reliability tradeoffs made by the MPI implementors to
>make the most out of the given system. A high-quality implementation
>will provide some knobs that will allow system administrators to
>tailor this tradeoff to their particular system. The FT API makes it
>possible for implementors to provide support for handling failures
>that is in any way that is richer than just aborting the
>applications. We cannot make constraints on how well they take
>advantage of this new dimension in user support.
>
>I guess the closest analogy to this is that MPI doesn't specify the
>latency and bandwidth of the network but does specify the semantics
>of MPI_Send and MPI_Recv. MPI implementations and system designers
>are free to satisfy these semantics in a way that optimizes
>performance, CPU overhead, scalability and system cost.
>
>Greg Bronevetsky
>Post-Doctoral Researcher
>1028 Building 451
>Lawrence Livermore National Lab
>(925) 424-5756
>bronevetsky1 at llnl.gov
>
>At 09:12 AM 2/13/2009, Supalov, Alexander wrote:
> >Hi,
> >
> >Thanks. It appears to me that this goes a step farther than the
> >spawning/attachment. That one was supposed to work everywhere, but
> >very few implementation were available until very recently, and
> >hence very few applications use this feature.
> >
> >Now, with FT optional in the sense that you described, who's going
> >to develop an application that may be able to take advantage of
> >fairly unspecified capability to detect unknown number of faults? I
> >can't imagine why this would happen.
> >
> >Also, imagine one got used to a certain set of provided FT features.
> >Going to another MPI implementation they may see another set of
> >features, up to and including none at all. Why would they
> >experiment? This locks users into one MPI implementation. No good.
> >
> >So, I'd say that unless we can guarantee some basic functionality
> >when FT is switched on that way or another, we're not going to
> >create much excitement in the user community. And life's too short
> >to work on something that nobody will be excited about.
> >
> >It may be better to propose a way to control FT as a subset or what
> >not, and then do guarantee that if you turn this capability on, you
> >get as a minimum certain benefits by agreeing to certain tradeoff on
> >performance.
> >
> >That may be out of the scope of this particular WG, but this is
> >something that this WG should very seriously consider in order for
> >the outcome to fly. Or so I think.
> >
> >Best regards.
> >
> >Alexander
> >
> >-----Original Message-----
> >From: mpi3-ft-bounces at lists.mpi-forum.org
> >[mailto:mpi3-ft-bounces at lists.mpi-forum.org] On Behalf Of Greg Bronevetsky
> >Sent: Friday, February 13, 2009 5:51 PM
> >To: MPI 3.0 Fault Tolerance and Dynamic Process Control working
> >Group; MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
> >Subject: Re: [Mpi3-ft] Communicator Virtualization as a step forward
> >
> >
> > >FT must be optional to be accepted. One way is to put it into a
> > >subset, say, thru the MPI_INIT_ASSERTED proposal. Another is to add
> > >yet one MPI_INIT call that will contain a flag for FT configuration
> > >(like MPI_FT_SYNCHRONOUS, MPI_FT_ASYNCHRONOUS, etc.). This was
> > >mentioned in relation to the issue notification in the earlier fault
> > >handling=error handling proposal of mine, see https://
> > >svn.mpi-forum.org/trac/mpi-forum-web/wiki/Fault%20Handling.
> >
> >I agree that FT must be optional but I don't think that we need to
> >add anything to the proposal to make this happen. The proposal
> >provides an API that allows the MPI implementation to tell the
> >application about detected but recoverable failures and help it
> >perform recovery. It does not say anything about which failures must
> >be recoverable for MPI. Reliable MPI implementations will be able to
> >do much more than unreliable MPI implementations. Users who need
> >reliability will choose the former while others will choose the
> >latter. The same will apply for things like network degradation.
> >Since the spec will never talk about what types of physical events
> >must be reportable by MPI, individual implementations will be able to
> >trade off efficiency against the usefulness of system monitoring and
> >all such choices will be compliant to the spec.
> >
> >Greg Bronevetsky
> >Post-Doctoral Researcher
> >1028 Building 451
> >Lawrence Livermore National Lab
> >(925) 424-5756
> >bronevetsky1 at llnl.gov
> >
> >_______________________________________________
> >mpi3-ft mailing list
> >mpi3-ft at lists.mpi-forum.org
> >http:// lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
> >---------------------------------------------------------------------
> >Intel GmbH
> >Dornacher Strasse 1
> >85622 Feldkirchen/Muenchen Germany
> >Sitz der Gesellschaft: Feldkirchen bei Muenchen
> >Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> >Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> >VAT Registration No.: DE129385895
> >Citibank Frankfurt (BLZ 502 109 00) 600119052
> >
> >This e-mail and any attachments may contain confidential material for
> >the sole use of the intended recipient(s). Any review or distribution
> >by others is strictly prohibited. If you are not the intended
> >recipient, please contact the sender and delete all copies.
> >
> >
> >_______________________________________________
> >mpi3-ft mailing list
> >mpi3-ft at lists.mpi-forum.org
> >http:// lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>
>_______________________________________________
>mpi3-ft mailing list
>mpi3-ft at lists.mpi-forum.org
>http:// lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>---------------------------------------------------------------------
>Intel GmbH
>Dornacher Strasse 1
>85622 Feldkirchen/Muenchen Germany
>Sitz der Gesellschaft: Feldkirchen bei Muenchen
>Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
>Registergericht: Muenchen HRB 47456 Ust.-IdNr.
>VAT Registration No.: DE129385895
>Citibank Frankfurt (BLZ 502 109 00) 600119052
>
>This e-mail and any attachments may contain confidential material for
>the sole use of the intended recipient(s). Any review or distribution
>by others is strictly prohibited. If you are not the intended
>recipient, please contact the sender and delete all copies.
>
>
>_______________________________________________
>mpi3-ft mailing list
>mpi3-ft at lists.mpi-forum.org
>http:// lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
_______________________________________________
mpi3-ft mailing list
mpi3-ft at lists.mpi-forum.org
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
More information about the mpiwg-ft
mailing list