[Mpi3-ft] Communicator Virtualization as a step forward
Greg Bronevetsky
bronevetsky1 at llnl.gov
Tue Feb 17 11:21:06 CST 2009
>Consider interaction of the MPI_Init/MPI_Finalize with the
>underlying OS and job managers. It's about as undefined as that of
>the mentioned calls with the checkpointer. Nevertheless, many usable
>implementations can cope with that quite nicely. By the way, this is
>how the checkpoint/restarting could work as well, providing the
>checkpointer at hand with the MPI ready to be checkpointed/restarted
>at a well defined point of the MPI program. After all, this is how
>it is done now: a configuration flag tells the MPI what checkpointer
>to expect. This could be refined, turned into dynamic recognition of
>the active checkpointer, etc. This is all trivial.
Because this is a separate argument, I decided to answer it
separately. I fully support this approach to providing checkpointing
to MPI applications and I don't see any extensions to the MPI
specification needed to support it. The only extensions that we've
developed so far have poor semantics and thus boil down to having a
single name for a bunch of semantically different operations that MPI
may perform. At this point we might as well not standardize this name
and let applications use multiple names for what is really a
different operation on each combination of checkpointer and system.
Greg Bronevetsky
Post-Doctoral Researcher
1028 Building 451
Lawrence Livermore National Lab
(925) 424-5756
bronevetsky1 at llnl.gov
http://greg.bronevetsky.com
More information about the mpiwg-ft
mailing list