[Mpi3-ft] system-level C/R requirements

Mike Heffner mike.heffner at librato.com
Fri Oct 24 19:48:14 CDT 2008


Supalov, Alexander wrote:
> Thanks. I think the word "how" below is decisive.
> 
> The definition of MPI_Init and MPI_Finalize do not say "how" processes
> are created, and still, they work. Likewise, as soon as we can define
> the expected outcome of the proposed calls, we can offload the "how" to
> the system - in this case, the CR system.
> 
> Now we come to the expected outcome. Imagine we guarantee that there's
> no MPI communication between the PREPARE and RESTORE calls, and no
> messages stuck in the wire or in the buffers. What can be stored in the
> system memory covered by CR will be stored there. The rest will be
> restored by the RESTORE call once it gets control over this memory image
> back. This may include reinitialization of the networking hardware,
> reestablishment of connections, reopening of the files, etc.
> 
> What other guarantees do CR people want?
> 

If the stack supported these calls asynchronously during MPI 
communication -- either from a signal handler or from a second thread -- 
then I think that definition would go a fair way towards what would be 
required.


Mike

-- 

   Mike Heffner <mike.heffner at evergrid.com>
   Librato, Inc.
   Blacksburg, VA USA

   Voice: (540) 443-3500 #603



More information about the mpiwg-ft mailing list