[Mpi3-ft] Communicator Virtualization as a step forward

Tue Feb 17 11:21:06 CST 2009

>Consider interaction of the MPI_Init/MPI_Finalize with the 
>underlying OS and job managers. It's about as undefined as that of 
>the mentioned calls with the checkpointer. Nevertheless, many usable 
>implementations can cope with that quite nicely. By the way, this is 
>how the checkpoint/restarting could work as well, providing the 
>checkpointer at hand with the MPI ready to be checkpointed/restarted 
>at a well defined point of the MPI program. After all, this is how 
>it is done now: a configuration flag tells the MPI what checkpointer 
>to expect. This could be refined, turned into dynamic recognition of 
>the active checkpointer, etc. This is all trivial.

Because this is a separate argument, I decided to answer it 
separately. I fully support this approach to providing checkpointing 
to MPI applications and I don't see any extensions to the MPI 
specification needed to support it. The only extensions that we've 
developed so far have poor semantics and thus boil down to having a 
single name for a bunch of semantically different operations that MPI 
may perform. At this point we might as well not standardize this name 
and let applications use multiple names for what is really a 
different operation on each combination of checkpointer and system.

Greg Bronevetsky
Post-Doctoral Researcher
1028 Building 451
Lawrence Livermore National Lab
(925) 424-5756
bronevetsky1 at llnl.gov
http://greg.bronevetsky.com