[Mpi3-ft] Eventually Perfect Failure Detector

Darius Buntinas buntinas at mcs.anl.gov
Fri Sep 2 14:04:22 CDT 2011

I think this sounds very good.  Some comments:

Fail-Stop:  Shouldn't it say that the process doesn't recover, or stops executing permanently? 

Mistakenly failed processes:  I think we should say that we allow the implementation to kill the process.

Paragraph starting with "Strong completeness":  Technically it's more likely that the process will _hang_ (in a blocking communication operation) than result in a "deadlock".  Maybe: "...which may result in a process stuck in, e.g., a blocking communication operation."

Advice to users:  Maybe change the sentence starting with "As such..." to "As such, it is possible that for a period of time, some alive processes in the MPI universe know of process failures that other alive processes do not."


On Sep 1, 2011, at 10:34 AM, Josh Hursey wrote:

> So we have been working under a modified definition of a 'Perfect
> Failure Detector' for a while. Recently we have been encouraged to
> revisit and clarify this assumption and definition (1) to make it
> clearer without the academic jargon (2) to be more precise as far as
> the detector we actually semantically require.
> It should be noted that a truly Perfect Detector can only be
> implemented in a synchronous system, which makes it an unsafe
> assumption for the MPI standard. In the RTS proposal we weakened the
> definition slightly to handle transient failures (an therefor a nod to
> partially synchronous systems), but we still talk about it as a
> Perfect Detector which has been getting us in trouble.
> After a few conversations with folks and some additional (re-)reading
> I started to reformulate the text in the RTS proposal. At bottom I
> included the new text that I am suggesting to replace the 'Perfect
> Detector' language in the RTS proposal.
> At its core is the movement to what is called an 'Eventually Perfect'
> failure detector with a terminate-upon-mistake constraint for
> mistakenly identified failed processes (previously we called them
> transitent failures). So to the local process the failure detector
> seems perfect, but in the system as a whole it is only eventually so.
> The front part of the text below is meant to be the clearer language,
> and I pushed the more precise stuff into the rationale.
> Take a read through the text below and let me know what you think. I
> suspect it will require some iteration to get just right, but
> hopefully it starts us out on better footing going forward.
> Thanks,
> Josh
> ---------------------------------------------
> 17.2 MPI Terms and Conventions
> Fail-Stop process failure is one in which the process stops executing,
> and its internal state is lost [5].
> 17.3 Detection of Process Failure
> MPI will provide the ability to detect process failures. MPI will
> guarantee that eventually all alive processes will be able to know
> about the failure. The state management and query operations defined
> in this chapter allow the application to query for the failed set of
> processes in a communication group. Additional semantics regarding
> communication involving failed processes are defined later in this
> chapter.
> It is possible that MPI mistakenly identifies a process as failed when
> it is not failed. In this situation the MPI library will exclude the
> mistakenly identified failed process from the MPI universe, and
> eventually all alive processes will see this process as failed.
> ------------------
> Rationale.
> MPI provides an eventually perfect failure detector for fail-stop
> process failures [1]. An eventually perfect failure detector is both
> strongly complete and eventually strongly accurate.
> Strong completeness is defined as: "Eventually every process that
> crashes is permanently suspected by every correct process" [1]. In
> essence this means that eventually every failed process will be able
> to be known to all alive processes. Without strong completeness some
> alive process that depends on a failed process is not guaranteed to
> ever receive an error, which can lead to a deadlock.
> Eventual strong accuracy is defined as: "There is a time after which
> correct processes are not suspected by any correct process" [1].
> Depending on the system architecture, it may be impossible to
> correctly determine if a process is failed or slow [3]. Eventual
> strong accuracy allows for unreliable failure detectors that may
> mistakenly suspect a process as failed when it is not failed [1].
> If a process failure was reported to the application and the process
> is later found to be alive then MPI will exclude the process from the
> MPI universe. Resolving the mistake by excluding the process from the
> MPI universe is similar to the technique used by the group membership
> protocol in [4]. This additional constraint allows for consistent
> reporting of error states to the local process. Without this
> constraint the application would not be able to trust the MPI
> implementation when it reports process failure errors. Once an alive
> process receives notification of a failed peer process, then it may
> continue under the assumption that the process is failed.
> End of rationale.
> ------------------
> ------------------
> Advice to users.
> The strong completeness condition of the failure detector allows the
> MPI implementation some flexibility in managing the performance costs
> involved with process failure detection and notification. As such, it
> is possible that some alive processes in the MPI universe know of
> process failures that other alive processes do not for a period of
> time. Additionally, if a process was mistakenly reported as failed it
> is possible that for some period of time a subset of processes
> interact with the process normally, while others see it as failed.
> Eventually all processes in the MPI universe will be able to be aware
> of the process failure.
> End of advice to users.
> ------------------
> ------------------
> Advice to implementors.
> An MPI implementation may choose to provide a stronger failure
> detector (i.e., perfect failure detector), but is not required to do
> so. This may be possible for MPI implementations targeted at
> synchronous systems [2].
> End of advice to implementors.
> ------------------
> Citations:
> ----------
> [1] Chandra, T. and Toueg, S. Unreliable Failure Detectors for
> Reliable Distributed Systems. Journal of the ACM (1996).
> [2] Dwork, C., Lynch, N., and Stockmeyer, L. Consensus in the Presence
> of Partial Synchrony. Journal of the ACM (1988).
> [3] Fischer, M., Lynch, N., and Paterson, M. Impossibility of
> distributed consensus with one faulty process. Journal of the ACM
> (1985).
> [4] Ricciardi, A. and Birman, K. Using process groups to implement
> failure detection in asynchronous environments. In Proceedings of the
> Eleventh ACM Symposium on Principles of Distributed Computing (1991).
> [5] Schlichting, R.D. and Schneider, F.B. Fail-stop processors: An
> approach to designing fault-tolerant computing systems. ACM
> Transactions on Computer Systems (1983).
> ---------------------------------------------
> -- 
> Joshua Hursey
> Postdoctoral Research Associate
> Oak Ridge National Laboratory
> http://users.nccs.gov/~jjhursey
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft

More information about the mpiwg-ft mailing list