[Mpi3-ft] system-level C/R requirements
Mike Heffner
mike.heffner at librato.com
Fri Oct 24 19:48:14 CDT 2008
Supalov, Alexander wrote:
> Thanks. I think the word "how" below is decisive.
>
> The definition of MPI_Init and MPI_Finalize do not say "how" processes
> are created, and still, they work. Likewise, as soon as we can define
> the expected outcome of the proposed calls, we can offload the "how" to
> the system - in this case, the CR system.
>
> Now we come to the expected outcome. Imagine we guarantee that there's
> no MPI communication between the PREPARE and RESTORE calls, and no
> messages stuck in the wire or in the buffers. What can be stored in the
> system memory covered by CR will be stored there. The rest will be
> restored by the RESTORE call once it gets control over this memory image
> back. This may include reinitialization of the networking hardware,
> reestablishment of connections, reopening of the files, etc.
>
> What other guarantees do CR people want?
>
If the stack supported these calls asynchronously during MPI
communication -- either from a signal handler or from a second thread --
then I think that definition would go a fair way towards what would be
required.
Mike
--
Mike Heffner <mike.heffner at evergrid.com>
Librato, Inc.
Blacksburg, VA USA
Voice: (540) 443-3500 #603
More information about the mpiwg-ft
mailing list