[Mpi3-ft] FT Levels of Support

Wed Oct 29 17:45:03 CDT 2008

>Aren't the second (MPI_ERRORS_FAIL_ATOMIC) and third (MPI_ERRORS_INTERACTIVE)
>option pretty much the same? The extra APIs that you can call in the third
>case need to be there in all cases, otherwise a code that potentially wants
>to use them if the MPI implements the third option would not link. These
>APIs would return "not implemented" in the first two options, i.e., they
>return an error, which is the same behavior required by option 2.
The difference between the two is that in the latter case MPI is not 
required to go around figuring out what exactly went wrong and only 
needs to ensure that the result of each individual call is sane. In 
particular, on the conference call today we were mulling the question 
of whether we want to require MPI implementations to give all 
application processes a consistent view of what went wrong (i.e. if 
two processes can't talk to each other but are otherwise functional, 
MPI can't tell both of them that the other is dead unless it proceeds 
to kill one of them). If we force this, MPI may have to spend 
considerable resources keeping track of system state whereas this 
overhead and complexity can be removed if MPI knows that the user can 
never ask about what's really happening.

>In addition, I don't think it is a good idea to put things like the
>piggybacking in such an optional set of functionality. Piggybacking
>is a fundamental set of calls that will be used beyond just providing
>FT. Hence, it should also be implemented even if the MPI does not
>provide advanced FT mechanisms.

Good point. This API should be special. It is related to fault 
tolerance but is really a more generic API.

Greg Bronevetsky
Post-Doctoral Researcher
1028 Building 451
Lawrence Livermore National Lab
(925) 424-5756
bronevetsky1 at llnl.gov