[Mpi3-ft] Communicator Virtualization as a step forward

Fri Feb 13 12:03:17 CST 2009

At 09:55 AM 2/13/2009, Supalov, Alexander wrote:
>Thanks. I guess there are several ways to deal with this situation:
>
>- Hope that market forces will make all MPIs provide a reasonable 
>level of FT support that will stabilize with the time.
>- Make certain promises in the standard and let people claim FT 
>compliance to this level and thus provide certain level of support.
>
>Introduction of FT support is akin to the thread support 
>introduction. In that case the Forum was able to determine several 
>reasonable levels that found acceptance with the users, and now we 
>see that mixed mode programs are starting to appear in substantial numbers.
>
>I would argue that the FT support in MPI-3 should attempt to do 
>something comparable. Providing a variable and unpredictable level 
>of FT support, which is how the initial description came across to 
>me, may not be good enough for people to take the plunge.
Its easier to do with threads because everybody knows what threads 
mean. With low-level failures there is no standard taxonomy that we 
can refer to when making the specification. One way to approach the 
problem is to lead by example. There is work at ORNL to implement the 
current FT proposal and it will probably become the reference 
implementation. In the future we'll be able to point to this 
implementation as well as FT-MPI to say what a reasonable level of 
support looks like.

>In some sense, the discussion on this topic mirrors the discussion 
>on the checkpoint/restart. There I heard arguments that since we 
>cannot define what this may possibly mean down in the MPI, and hence 
>we cannot simply do with the MPI_Prepare_for_checkpoint & 
>MPI_Recover_after_restart calls that would be basically 
>implementation specific (in MPI and checkpointing system sense).
>
>Here we say that we cannot define anything tangible to introduce the 
>FT support levels, but still we are going ahead with introducing FT 
>into the MPI-3, at some unfathomable level, in full hope that life 
>will fix things up somehow.
>
>Do you notice some kind of discord here? I seem to.

The comparison is apt but incomplete. The problem with checkpoint 
support is that the calls MPI_Prepare_for_checkpoint & 
MPI_Recover_after_restart will have little semantic meaning. On the 
other hand, the fault notification API will have clear meaning but 
does not define which subset of low-level failures fall into the 
recoverable bucket and which into the non-recoverable bucket. Not 
defining the buckets is much closer to not defining the communication 
latency than to not defining what it means to make MPI "checkpoint-ready".

Greg Bronevetsky
Post-Doctoral Researcher
1028 Building 451
Lawrence Livermore National Lab
(925) 424-5756
bronevetsky1 at llnl.gov