[Mpi3-ft] system-level C/R requirements

Supalov, Alexander alexander.supalov at intel.com
Fri Oct 24 16:14:47 CDT 2008

PS. Regarding MPI state freeze: if the CR system involved can
consistenly capture and restore global network and wire state in
addition to the memory state, the proposed calls can be nearly no-ops. 

-----Original Message-----
From: Supalov, Alexander 
Sent: Friday, October 24, 2008 11:04 PM
To: 'MPI 3.0 Fault Tolerance and Dynamic Process Control working Group'
Subject: RE: [Mpi3-ft] system-level C/R requirements

Thanks. I can't speak for the whole Forum, but my impression is that if
the choice will be between solving the problem of MPI and CR on one
hand, and not solving it on the other hand, a reasonable proposal will
go a long way toward convincing the majority, or at least moving the
discussion to a still better proposal.

As for the number of calls, this is question of ROI. We're going to add
200 or so fancy calls by the latest guess, while here we have just 2
that offer basic functionality of undeniable value. This should be

Finally, I don't know a more implementation specific call than MPI_Init.
The proposed calls live close nearby.

-----Original Message-----
From: mpi3-ft-bounces at lists.mpi-forum.org
[mailto:mpi3-ft-bounces at lists.mpi-forum.org] On Behalf Of Greg
Sent: Friday, October 24, 2008 10:51 PM
To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group;
MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
Subject: Re: [Mpi3-ft] system-level C/R requirements

>Probably we should not try to define this, and encourage
>checkpoint/restart people work with particular MPI implementations to
>make sure things work? After all, it's not about an abstract MPI and
>abstract CR system - there will probably always be actual pairs (or
>other relationships) of them that can work together.
>Compare this to the situation with threads. MPI acknowledges their
>existence and provide a couple of calls to request a particular level
>support, that's all. This was good enough for starters, and may change
>in the future. The relation between MPI and CR may go this way, too:
>acknowledging first, integrating next.

If all that you want to add to the spec is a couple of calls to say 
that MPI state should be frozen in some sense, that's fine with me 
since we're defining what these calls do. However, I'm not sure that 
the wider forum will accept them because there is a general dislike 
for adding more calls and the primary value of these calls is to 
motivate people to provide implementation-specific definitions for 
them, which I'm guessing is lower than the standard bar for acceptance.

Greg Bronevetsky
Post-Doctoral Researcher
1028 Building 451
Lawrence Livermore National Lab
(925) 424-5756
bronevetsky1 at llnl.gov 

mpi3-ft mailing list
mpi3-ft at lists.mpi-forum.org
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

More information about the mpiwg-ft mailing list