[Mpi3-ft] MPI Fault Tolerance scenarios

Supalov, Alexander alexander.supalov at intel.com
Wed Mar 4 11:07:43 CST 2009

Thanks. I think we're looking for a better solution for both implementors and users. If this includes keeping things simple, so be it. If the low-level interface can prove its worth in advanced scenarios, the better.

From: mpi3-ft-bounces at lists.mpi-forum.org [mailto:mpi3-ft-bounces at lists.mpi-forum.org] On Behalf Of Greg Bronevetsky
Sent: Tuesday, March 03, 2009 7:52 PM
To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group; MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
Subject: Re: [Mpi3-ft] MPI Fault Tolerance scenarios

Right. The low-level interface may be optional.

That would be an interesting choice for the API: provide several levels of support and for each level provide an optional lower-level API that can be used to control recovery more finely than would be possible using the default error handlers. I think that we'll need an explicit communicator rejoin API even when using built-in error handlers but at least we won't force users to manually check error codes. Having this double stack leaves me worried that the low-level API will simply get dropped because the rest of the forum will not see the need for something that complicated. However, I think we'll have a few very persuasive arguments such as the fact that checking for failures immediately after every collective will have a huge performance hit, whereas giving control to the user will make it efficient.

Greg Bronevetsky
Post-Doctoral Researcher
1028 Building 451
Lawrence Livermore National Lab
(925) 424-5756
bronevetsky1 at llnl.gov
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20090304/d7076ee9/attachment-0001.html>

More information about the mpiwg-ft mailing list