[Mpi3-ft] system-level C/R requirements
herault.thomas at gmail.com
Mon Oct 27 13:50:51 CDT 2008
Le 27 oct. 08 à 17:16, Joseph Ruscio a écrit :
> On Oct 27, 2008, at 7:18 AM, Thomas Herault wrote:
>> So, my question is: what are the benefits of solution b) as
>> compared to solution a)?
> If you go with solution b), the only thing the MPI implementation
> needs to worry about is PREPAREing and RESTOREing the network state.
> All of the checkpoint specific issues and complexity such as queuing
> system integration, checkpoint file storage options like redundant
> local storage, etc.
Except that, as pointed out before, what PREPARE and RESTORE must do
is very system and checkpoint mechanism dependent.
> In case a), MPI implementations need to concern themselves with
> these issues AND support the non-standard checkpoint invocation
> API's for every different desired system checkpointer.
Agreed. The point of case a) is that we modify the MPI library knowing
exactly what the checkpoint mechanism saves. Since we are handling the
network specific parts, we can find workarounds for things like pinned
pages, without too much impact on the network state (like having to
completely close the network interface, to re-open it in RESTORE).
> So that's a question of whether the responsibility for system-level
> checkpointer integration lays on the MPI implementor or the
> individual checkpoint implementors.
> Going with b) gives MPI an opportunity to minimally specify an
> integration mechanism with system-level checkpointer's. If the MPI
> implementor does not wish to worry about these classes of CP/R
> systems, they just don't implement the minimal set of calls. If they
> do want the support, they implement the calls.
I agree that the point of case b) would be to relieve the data
movement/saving/restoring part. However, one could argue that MPI is
also well-placed to do this part (using MPI-IO for example). I am not
convinced that this will give an opportunity to *minimally* specify
the integration mechanism. This would be minimal in the number of
calls to implement, but the cost of these calls would be nothing but
> A single implementation of PREPARE and RESTORE would support most
> system-level checkpointers. For example our checkpointer that sits
> completely in user-land and BLCR that sits completely in the OS
> would have the same set of requirements. VMM level checkpointers
> have been suggested as being different. Wouldn't they either have
> the same requirements, or be completely transparent to the MPI stack
> i.e. snapshotting the OS, communication device, etc and allowing the
> protocol to sort out lost messages?
Moreover, to have an integration mechanism which support most system-
level checkpointers, it seems to me that you imply that PREPARE/
RESTORE would be generic. Then, we need to define more precisely what
they would do. The current proposition of Alexander is to have a vague
definition (like PREPARE put the MPI in a checkpointable state,
RESTORE restores communications, and MPI is authorized to do something
that would not be checkpointable), and to let implementations find a
way to implement this specification. We would end with
implementation / checkpointing systems pairs, like Alexander said. So,
I don't really see the benefit as compared to case a) (where we wold
also have implementation / checkpointing systems pairs).
More information about the mpiwg-ft