[Mpi3-ft] Summary of today's meeting
Mike Heffner
mike.heffner at librato.com
Thu Oct 23 15:31:19 CDT 2008
Greg Bronevetsky wrote:
>
>>> What kind of quiscence are you thinking of? It seems to me that
>>> applications will simply need to ensure that either no messages are
>>> in-flight at the time of the checkpoint or that all such messages
>>> have been logged appropriately by the application.
>>
>> In an application-directed C/R system that would be the ideal MPI
>> quiescence.
>
> But that is something that the application is responsible for. Was there
> anything that MPI would be responsible for?
Even in the application-directed case, most MPI stacks would still need
an API call to inform them to "park" their state in a manner that it can
be correctly restarted. This might mean recording cached memory
registrations (not the actual memory regions, just the handles
associated with them), recording the open communicators and recording
the communication channels open amongst processes. Some MPI stacks may
find it more efficient to collate this state information only at
checkpoint time vs. maintaining it throughout job execution.
I would agree though that the message quiescence is more powerful for
the asynchronous and/or transparent checkpointing cases.
Mike
--
Mike Heffner <mike.heffner at evergrid.com>
Librato, Inc.
Blacksburg, VA USA
Voice: (540) 443-3500 #603
More information about the mpiwg-ft
mailing list