[Mpi3-ft] The state of MPI is undefined

Howard Pritchard howardp at cray.com
Mon Jun 13 16:01:44 CDT 2011

Hi Josh,

> How about:
> ------------------
> MPI does not provide the user with transparent process recovery upon
> process failure.
> Once a process fails, MPI does not guarantee that the job can continue
> or, if the job can continue, that the process can be recovered.
> If a process failure occurs, and the MPI implementation is able to
> correctly continue operating after process failure then the MPI
> implementation will return an appropriate error class (e.g.,
> MPI_ERR_RANK_FAIL_STOP) and provide the additional semantics defined
> in Chapter 17.
> The MPI implementation documentation will provide information on the
> possible effect of each supported class of errors.
> Advice to users:
> It is possible that the state of MPI becomes corrupted in an
> undetectable manner. This undetectable error might force the MPI
> implementation to return an error code (or even success)
> unintentionally. For this pathological case there exist no mechanisms
> to determine the correct running state. A high-quality implementation
> will strive to return the correct return code from all MPI operations.
> ------------------

I vote for this wording.


Howard Pritchard
Software Engineering
Cray, Inc.

More information about the mpiwg-ft mailing list