[Mpi3-ft] Summary of today's meeting

Mike Heffner mike.heffner at librato.com
Thu Oct 23 15:31:19 CDT 2008


Greg Bronevetsky wrote:
> 
>>> What kind of quiscence are you thinking of? It seems to me that 
>>> applications will simply need to ensure that either no messages are 
>>> in-flight at the time of the checkpoint or that all such messages 
>>> have been logged appropriately by the application.
>>
>> In an application-directed C/R system that would be the ideal MPI 
>> quiescence.
> 
> But that is something that the application is responsible for. Was there 
> anything that MPI would be responsible for?

Even in the application-directed case, most MPI stacks would still need 
an API call to inform them to "park" their state in a manner that it can 
be correctly restarted. This might mean recording cached memory 
registrations (not the actual memory regions, just the handles 
associated with them), recording the open communicators and recording 
the communication channels open amongst processes. Some MPI stacks may 
find it more efficient to collate this state information only at 
checkpoint time vs. maintaining it throughout job execution.

I would agree though that the message quiescence is more powerful for 
the asynchronous and/or transparent checkpointing cases.


Mike

-- 

   Mike Heffner <mike.heffner at evergrid.com>
   Librato, Inc.
   Blacksburg, VA USA

   Voice: (540) 443-3500 #603



More information about the mpiwg-ft mailing list