[Mpi3-ft] Eventually Perfect Failure Detector

Josh Hursey jjhursey at open-mpi.org
Thu Sep 1 10:34:22 CDT 2011

So we have been working under a modified definition of a 'Perfect
Failure Detector' for a while. Recently we have been encouraged to
revisit and clarify this assumption and definition (1) to make it
clearer without the academic jargon (2) to be more precise as far as
the detector we actually semantically require.

It should be noted that a truly Perfect Detector can only be
implemented in a synchronous system, which makes it an unsafe
assumption for the MPI standard. In the RTS proposal we weakened the
definition slightly to handle transient failures (an therefor a nod to
partially synchronous systems), but we still talk about it as a
Perfect Detector which has been getting us in trouble.

After a few conversations with folks and some additional (re-)reading
I started to reformulate the text in the RTS proposal. At bottom I
included the new text that I am suggesting to replace the 'Perfect
Detector' language in the RTS proposal.

At its core is the movement to what is called an 'Eventually Perfect'
failure detector with a terminate-upon-mistake constraint for
mistakenly identified failed processes (previously we called them
transitent failures). So to the local process the failure detector
seems perfect, but in the system as a whole it is only eventually so.

The front part of the text below is meant to be the clearer language,
and I pushed the more precise stuff into the rationale.

Take a read through the text below and let me know what you think. I
suspect it will require some iteration to get just right, but
hopefully it starts us out on better footing going forward.


17.2 MPI Terms and Conventions

Fail-Stop process failure is one in which the process stops executing,
and its internal state is lost [5].

17.3 Detection of Process Failure

MPI will provide the ability to detect process failures. MPI will
guarantee that eventually all alive processes will be able to know
about the failure. The state management and query operations defined
in this chapter allow the application to query for the failed set of
processes in a communication group. Additional semantics regarding
communication involving failed processes are defined later in this

It is possible that MPI mistakenly identifies a process as failed when
it is not failed. In this situation the MPI library will exclude the
mistakenly identified failed process from the MPI universe, and
eventually all alive processes will see this process as failed.

MPI provides an eventually perfect failure detector for fail-stop
process failures [1]. An eventually perfect failure detector is both
strongly complete and eventually strongly accurate.

Strong completeness is defined as: "Eventually every process that
crashes is permanently suspected by every correct process" [1]. In
essence this means that eventually every failed process will be able
to be known to all alive processes. Without strong completeness some
alive process that depends on a failed process is not guaranteed to
ever receive an error, which can lead to a deadlock.

Eventual strong accuracy is defined as: "There is a time after which
correct processes are not suspected by any correct process" [1].
Depending on the system architecture, it may be impossible to
correctly determine if a process is failed or slow [3]. Eventual
strong accuracy allows for unreliable failure detectors that may
mistakenly suspect a process as failed when it is not failed [1].

If a process failure was reported to the application and the process
is later found to be alive then MPI will exclude the process from the
MPI universe. Resolving the mistake by excluding the process from the
MPI universe is similar to the technique used by the group membership
protocol in [4]. This additional constraint allows for consistent
reporting of error states to the local process. Without this
constraint the application would not be able to trust the MPI
implementation when it reports process failure errors. Once an alive
process receives notification of a failed peer process, then it may
continue under the assumption that the process is failed.

End of rationale.

Advice to users.
The strong completeness condition of the failure detector allows the
MPI implementation some flexibility in managing the performance costs
involved with process failure detection and notification. As such, it
is possible that some alive processes in the MPI universe know of
process failures that other alive processes do not for a period of
time. Additionally, if a process was mistakenly reported as failed it
is possible that for some period of time a subset of processes
interact with the process normally, while others see it as failed.
Eventually all processes in the MPI universe will be able to be aware
of the process failure.

End of advice to users.

Advice to implementors.
An MPI implementation may choose to provide a stronger failure
detector (i.e., perfect failure detector), but is not required to do
so. This may be possible for MPI implementations targeted at
synchronous systems [2].

End of advice to implementors.

[1] Chandra, T. and Toueg, S. Unreliable Failure Detectors for
Reliable Distributed Systems. Journal of the ACM (1996).
[2] Dwork, C., Lynch, N., and Stockmeyer, L. Consensus in the Presence
of Partial Synchrony. Journal of the ACM (1988).
[3] Fischer, M., Lynch, N., and Paterson, M. Impossibility of
distributed consensus with one faulty process. Journal of the ACM
[4] Ricciardi, A. and Birman, K. Using process groups to implement
failure detection in asynchronous environments. In Proceedings of the
Eleventh ACM Symposium on Principles of Distributed Computing (1991).
[5] Schlichting, R.D. and Schneider, F.B. Fail-stop processors: An
approach to designing fault-tolerant computing systems. ACM
Transactions on Computer Systems (1983).


Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory

More information about the mpiwg-ft mailing list