[Mpi3-ft] system-level C/R requirements
alexander.supalov at intel.com
Fri Oct 24 15:33:37 CDT 2008
PS. Better still, add a communicator argument to these calls and be
happy: both extremes as well as anything in between will be covered. How
much will be supported is again up to the implementation.
From: Supalov, Alexander
Sent: Friday, October 24, 2008 10:32 PM
To: 'MPI 3.0 Fault Tolerance and Dynamic Process Control working Group'
Subject: RE: [Mpi3-ft] system-level C/R requirements
I'm afraid we're overcomplicating things a little here. What we need are
basically two collective calls:
The former is (almost) like MPI_Finalize, the latter is (almost) like
MPI_Init. What they mean is up to the implementation, with one
condition: it must be possible to do actual checkpoint/restart in
I cannot exclude that the exact meaning of the calls and the notice will
be influenced by implementation details like memory registration, the
checkpoint/restart system used, the network involved, etc.
These collective calls may be complemented by individual, non-collective
calls if needed. They will be suitable for individual
checkpoint/restart, and the user will have to make sure no bad things
happen, like messages trying to reach a process, the memory of which is
currently being dumped.
From: mpi3-ft-bounces at lists.mpi-forum.org
[mailto:mpi3-ft-bounces at lists.mpi-forum.org] On Behalf Of Greg
Sent: Friday, October 24, 2008 10:21 PM
To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group;
MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
Subject: Re: [Mpi3-ft] system-level C/R requirements
>Agreed -- specifiying an explicit list of platforms or OS or even
>resource specifics is not the way to go in a standard.
>My suggestion would be to explore if we can define abstract,
>higher-level resources to define a "state", and specify high-level
>actions. For instance, pinning/unpinning memory is very specific to
>RDMA, but maybe a "disconnect virtual connection" operation may
>abstract it. But this puts us into the realm of virtualizing MPI
>internal components/concepts ..
>Maybe there is a more elegant way ...
The thing that worries me is that an MPI implementation may have a
fair amount of state sitting on the network card. This state is
unreachable by a user- or kernel-level checkpointer but may be
reachable by a VMM-level checkpointer. How do we differentiate the
level at which we're working? System-level checkpointers working at
different levels need MPI state to be flushed to different levels of
abstraction and it seems that we'll need to be very low-level in
order to define what it means to operate at a given level of
1028 Building 451
Lawrence Livermore National Lab
bronevetsky1 at llnl.gov
mpi3-ft mailing list
mpi3-ft at lists.mpi-forum.org
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
More information about the mpiwg-ft