[Mpi3-ft] system-level C/R requirements

Supalov, Alexander alexander.supalov at intel.com
Fri Oct 24 17:09:03 CDT 2008


Thanks. I think the word "how" below is decisive.

The definition of MPI_Init and MPI_Finalize do not say "how" processes
are created, and still, they work. Likewise, as soon as we can define
the expected outcome of the proposed calls, we can offload the "how" to
the system - in this case, the CR system.

Now we come to the expected outcome. Imagine we guarantee that there's
no MPI communication between the PREPARE and RESTORE calls, and no
messages stuck in the wire or in the buffers. What can be stored in the
system memory covered by CR will be stored there. The rest will be
restored by the RESTORE call once it gets control over this memory image
back. This may include reinitialization of the networking hardware,
reestablishment of connections, reopening of the files, etc.

What other guarantees do CR people want?

-----Original Message-----
From: mpi3-ft-bounces at lists.mpi-forum.org
[mailto:mpi3-ft-bounces at lists.mpi-forum.org] On Behalf Of Greg
Bronevetsky
Sent: Friday, October 24, 2008 11:38 PM
To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group;
MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
Subject: Re: [Mpi3-ft] system-level C/R requirements

The imprecision comes from the MPI library's interactions with the 
C/R tool. It seems to me that each tool/MPI library combo will have 
to define their own semantics for the two calls. The only standard 
here would be that these functions would in fact need to be called. 
But at this point its not really an API but a mild constraint on how 
the real API would be used. As such, it doesn't carry much 
information about how checkpointing is to be done. What the 
system-level C/R people need is a set of guarantees about where MPI 
will put its state between the two calls. I don't see a way to give 
them such guarantees in a standardized way. This is where we get 
stuck: we either provide a couple of calls that has little 
informational content but rather server as placeholders for a real or 
go fully detailed on a platform-by-platform basic. Neither approach 
is likely to pass by the wider forum, which is why I don't know how 
to satisfy this need within the MPI 3.0 effort.

It looks to me like we're standardizing something that is already
non-standard.

Greg Bronevetsky
Post-Doctoral Researcher
1028 Building 451
Lawrence Livermore National Lab
(925) 424-5756
bronevetsky1 at llnl.gov

At 02:27 PM 10/24/2008, Supalov, Alexander wrote:
>Thanks. Can (or should) one define semantics better than those of the
>MPI_INIT and MPI_FINALIZE? MPI job starts after MPI_Init. The job ends
>after MPI_Finalize. What happens before and after is almost undefined.
>This is about all the standard specifies, and it's rather clear why: it
>cannot prescribe the way in which processes are started, because it's
>very system specific. CR is possibly even more system specific.
>
>Let's get back to the proposal:
>
>MPI_PREPARE_FOR_CHECKPOINT(MPI_COMM)    ~ MPI_FINALIZE
>MPI_RESTORE_AFTER_CHECKPOINT(MPI_COMM)  ~ MPI_INIT
>
>Use MPI_COMM_WORLD for global CR. Use MPI_COMM_SELF for local CR.
>
>Call the first function immediately before the checkpoint, do the
>checkpoint the way you like, and call the second immediately after to
>re-enter the MPI session where you left it.
>
>What else can be added to make this more clear and more precise than
>MPI_INIT and MPI_FINALIZE definitions?
>
>-----Original Message-----
>From: mpi3-ft-bounces at lists.mpi-forum.org
>[mailto:mpi3-ft-bounces at lists.mpi-forum.org] On Behalf Of Greg
>Bronevetsky
>Sent: Friday, October 24, 2008 11:14 PM
>To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group;
>MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
>Subject: Re: [Mpi3-ft] system-level C/R requirements
>
>I think that the problem for the forum will be the unclear semantics
>of the new calls. MPI_Init is not a good example because it has clear
>semantics for all users of MPI but not system-level services. The
>difference with the quiscence calls is that we're trying to provide a
>way to by-pass to regular MPI semantics and plug into the middle of
>MPI without precisely defining how the by-pass works. Precise
>semantics didn't matter for MPI_Init exactly because there has never
>been a way to look into the MPI implementation until now. The
>solution to this is to provide very loose semantics to the new calls
>but this just means that there will actually be no standard way to
>use the new calls, which is why I'm afraid the forum will not like it.
>
>I can think of only two things that we can compare these calls to.
>The first is the proposed performance hint API. However, this API is
>just about hints and may not be a good enough analogy for the rest of
>the forum. The other analogy is the performance profiling APIs that
>some MPI implementation support. These APIs allow tools to determine
>some statistics about internal MPI state. If that is the analogy that
>is drawn, then it is bad for this proposal because I don't think that
>the performance profiling API ever got much support because of the
>issues that we're discussing here.
>
>Greg Bronevetsky
>Post-Doctoral Researcher
>1028 Building 451
>Lawrence Livermore National Lab
>(925) 424-5756
>bronevetsky1 at llnl.gov
>
>At 02:03 PM 10/24/2008, Supalov, Alexander wrote:
> >Thanks. I can't speak for the whole Forum, but my impression is that
if
> >the choice will be between solving the problem of MPI and CR on one
> >hand, and not solving it on the other hand, a reasonable proposal
will
> >go a long way toward convincing the majority, or at least moving the
> >discussion to a still better proposal.
> >
> >As for the number of calls, this is question of ROI. We're going to
add
> >200 or so fancy calls by the latest guess, while here we have just 2
> >that offer basic functionality of undeniable value. This should be
> >acceptable.
>
> >Finally, I don't know a more implementation specific call than
>MPI_Init.
> >The proposed calls live close nearby.
>
>
>
>_______________________________________________
>mpi3-ft mailing list
>mpi3-ft at lists.mpi-forum.org
>http:// lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>---------------------------------------------------------------------
>Intel GmbH
>Dornacher Strasse 1
>85622 Feldkirchen/Muenchen Germany
>Sitz der Gesellschaft: Feldkirchen bei Muenchen
>Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
>Registergericht: Muenchen HRB 47456 Ust.-IdNr.
>VAT Registration No.: DE129385895
>Citibank Frankfurt (BLZ 502 109 00) 600119052
>
>This e-mail and any attachments may contain confidential material for
>the sole use of the intended recipient(s). Any review or distribution
>by others is strictly prohibited. If you are not the intended
>recipient, please contact the sender and delete all copies.
>
>
>_______________________________________________
>mpi3-ft mailing list
>mpi3-ft at lists.mpi-forum.org
>http:// lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft

_______________________________________________
mpi3-ft mailing list
mpi3-ft at lists.mpi-forum.org
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.





More information about the mpiwg-ft mailing list