[Mpi3-ft] Communicator Virtualization as a step forward

Supalov, Alexander alexander.supalov at intel.com
Wed Feb 18 06:24:22 CST 2009


Thanks. How will you let the MPI know the checkpoint is coming, to give it a fair chance to prepare to this and then recover after the checkpoint? This is akin to the MPI_Finalize/MPI_Init in some sense, midway thru the job, hence the analogy.

-----Original Message-----
From: mpi3-ft-bounces at lists.mpi-forum.org [mailto:mpi3-ft-bounces at lists.mpi-forum.org] On Behalf Of Greg Bronevetsky
Sent: Tuesday, February 17, 2009 6:21 PM
To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group; MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
Subject: Re: [Mpi3-ft] Communicator Virtualization as a step forward


>Consider interaction of the MPI_Init/MPI_Finalize with the
>underlying OS and job managers. It's about as undefined as that of
>the mentioned calls with the checkpointer. Nevertheless, many usable
>implementations can cope with that quite nicely. By the way, this is
>how the checkpoint/restarting could work as well, providing the
>checkpointer at hand with the MPI ready to be checkpointed/restarted
>at a well defined point of the MPI program. After all, this is how
>it is done now: a configuration flag tells the MPI what checkpointer
>to expect. This could be refined, turned into dynamic recognition of
>the active checkpointer, etc. This is all trivial.

Because this is a separate argument, I decided to answer it
separately. I fully support this approach to providing checkpointing
to MPI applications and I don't see any extensions to the MPI
specification needed to support it. The only extensions that we've
developed so far have poor semantics and thus boil down to having a
single name for a bunch of semantically different operations that MPI
may perform. At this point we might as well not standardize this name
and let applications use multiple names for what is really a
different operation on each combination of checkpointer and system.

Greg Bronevetsky
Post-Doctoral Researcher
1028 Building 451
Lawrence Livermore National Lab
(925) 424-5756
bronevetsky1 at llnl.gov
http://greg.bronevetsky.com

_______________________________________________
mpi3-ft mailing list
mpi3-ft at lists.mpi-forum.org
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.





More information about the mpiwg-ft mailing list