[Mpi3-ft] Communicator Virtualization as a step forward

Greg Bronevetsky bronevetsky1 at llnl.gov
Wed Feb 18 10:29:04 CST 2009


>Thanks. How will you let the MPI know the checkpoint is coming, to 
>give it a fair chance to prepare to this and then recover after the 
>checkpoint? This is akin to the MPI_Finalize/MPI_Init in some sense, 
>midway thru the job, hence the analogy.

Just use the checkpointer-specific call. The call is going to have 
checkpointer-specific semantics, so why not give it a 
checkpointer-specific name? I understand that there is some use to 
allowing applications to use the same name across all checkpointers 
but the bar should be higher than that for adding something to the 
standard. Also, right now the whole approach inherently only supports 
one checkpointing protocol: synch-and-stop. If we can work out a more 
generic API that supports other protocols I think that it may have 
enough value to be included in the spec. Right now it still hasn't 
passed the bar.

Greg Bronevetsky
Post-Doctoral Researcher
1028 Building 451
Lawrence Livermore National Lab
(925) 424-5756
bronevetsky1 at llnl.gov
http://greg.bronevetsky.com 




More information about the mpiwg-ft mailing list