[Mpi3-ft] Eventually Perfect Failure Detector

Josh Hursey jjhursey at open-mpi.org
Fri Sep 2 15:34:35 CDT 2011


I added the current text to the wiki:
 https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/ft/rts_failure_detector

More below...

On Fri, Sep 2, 2011 at 3:04 PM, Darius Buntinas <buntinas at mcs.anl.gov> wrote:
> I think this sounds very good.  Some comments:
>
> Fail-Stop:  Shouldn't it say that the process doesn't recover, or stops executing permanently?

I think adding the work 'permanently' helps drive this home.

>
> Mistakenly failed processes:  I think we should say that we allow the implementation to kill the process.

I added the sentence:
 The MPI implementation is allowed to terminate the process that was
mistakenly identified as failed.

>
> Paragraph starting with "Strong completeness":  Technically it's more likely that the process will _hang_ (in a blocking communication operation) than result in a "deadlock".  Maybe: "...which may result in a process stuck in, e.g., a blocking communication operation."

How about:
 Without strong completeness some alive process that depends on a
failed process is not guaranteed that communication with the failed
process will ever complete with an error (e.g., may wait indefinitely
in a blocking receive operation).

>
> Advice to users:  Maybe change the sentence starting with "As such..." to "As such, it is possible that for a period of time, some alive processes in the MPI universe know of process failures that other alive processes do not."

Changed.

Thanks,
Josh

>
> -d
>
> On Sep 1, 2011, at 10:34 AM, Josh Hursey wrote:
>
>> So we have been working under a modified definition of a 'Perfect
>> Failure Detector' for a while. Recently we have been encouraged to
>> revisit and clarify this assumption and definition (1) to make it
>> clearer without the academic jargon (2) to be more precise as far as
>> the detector we actually semantically require.
>>
>> It should be noted that a truly Perfect Detector can only be
>> implemented in a synchronous system, which makes it an unsafe
>> assumption for the MPI standard. In the RTS proposal we weakened the
>> definition slightly to handle transient failures (an therefor a nod to
>> partially synchronous systems), but we still talk about it as a
>> Perfect Detector which has been getting us in trouble.
>>
>> After a few conversations with folks and some additional (re-)reading
>> I started to reformulate the text in the RTS proposal. At bottom I
>> included the new text that I am suggesting to replace the 'Perfect
>> Detector' language in the RTS proposal.
>>
>> At its core is the movement to what is called an 'Eventually Perfect'
>> failure detector with a terminate-upon-mistake constraint for
>> mistakenly identified failed processes (previously we called them
>> transitent failures). So to the local process the failure detector
>> seems perfect, but in the system as a whole it is only eventually so.
>>
>> The front part of the text below is meant to be the clearer language,
>> and I pushed the more precise stuff into the rationale.
>>
>> Take a read through the text below and let me know what you think. I
>> suspect it will require some iteration to get just right, but
>> hopefully it starts us out on better footing going forward.
>>
>> Thanks,
>> Josh
>>
>>
>> ---------------------------------------------
>> 17.2 MPI Terms and Conventions
>>
>> Fail-Stop process failure is one in which the process stops executing,
>> and its internal state is lost [5].
>>
>> 17.3 Detection of Process Failure
>>
>> MPI will provide the ability to detect process failures. MPI will
>> guarantee that eventually all alive processes will be able to know
>> about the failure. The state management and query operations defined
>> in this chapter allow the application to query for the failed set of
>> processes in a communication group. Additional semantics regarding
>> communication involving failed processes are defined later in this
>> chapter.
>>
>> It is possible that MPI mistakenly identifies a process as failed when
>> it is not failed. In this situation the MPI library will exclude the
>> mistakenly identified failed process from the MPI universe, and
>> eventually all alive processes will see this process as failed.
>>
>> ------------------
>> Rationale.
>> MPI provides an eventually perfect failure detector for fail-stop
>> process failures [1]. An eventually perfect failure detector is both
>> strongly complete and eventually strongly accurate.
>>
>> Strong completeness is defined as: "Eventually every process that
>> crashes is permanently suspected by every correct process" [1]. In
>> essence this means that eventually every failed process will be able
>> to be known to all alive processes. Without strong completeness some
>> alive process that depends on a failed process is not guaranteed to
>> ever receive an error, which can lead to a deadlock.
>>
>> Eventual strong accuracy is defined as: "There is a time after which
>> correct processes are not suspected by any correct process" [1].
>> Depending on the system architecture, it may be impossible to
>> correctly determine if a process is failed or slow [3]. Eventual
>> strong accuracy allows for unreliable failure detectors that may
>> mistakenly suspect a process as failed when it is not failed [1].
>>
>> If a process failure was reported to the application and the process
>> is later found to be alive then MPI will exclude the process from the
>> MPI universe. Resolving the mistake by excluding the process from the
>> MPI universe is similar to the technique used by the group membership
>> protocol in [4]. This additional constraint allows for consistent
>> reporting of error states to the local process. Without this
>> constraint the application would not be able to trust the MPI
>> implementation when it reports process failure errors. Once an alive
>> process receives notification of a failed peer process, then it may
>> continue under the assumption that the process is failed.
>>
>> End of rationale.
>> ------------------
>>
>> ------------------
>> Advice to users.
>> The strong completeness condition of the failure detector allows the
>> MPI implementation some flexibility in managing the performance costs
>> involved with process failure detection and notification. As such, it
>> is possible that some alive processes in the MPI universe know of
>> process failures that other alive processes do not for a period of
>> time. Additionally, if a process was mistakenly reported as failed it
>> is possible that for some period of time a subset of processes
>> interact with the process normally, while others see it as failed.
>> Eventually all processes in the MPI universe will be able to be aware
>> of the process failure.
>>
>> End of advice to users.
>> ------------------
>>
>> ------------------
>> Advice to implementors.
>> An MPI implementation may choose to provide a stronger failure
>> detector (i.e., perfect failure detector), but is not required to do
>> so. This may be possible for MPI implementations targeted at
>> synchronous systems [2].
>>
>> End of advice to implementors.
>> ------------------
>>
>> Citations:
>> ----------
>> [1] Chandra, T. and Toueg, S. Unreliable Failure Detectors for
>> Reliable Distributed Systems. Journal of the ACM (1996).
>> [2] Dwork, C., Lynch, N., and Stockmeyer, L. Consensus in the Presence
>> of Partial Synchrony. Journal of the ACM (1988).
>> [3] Fischer, M., Lynch, N., and Paterson, M. Impossibility of
>> distributed consensus with one faulty process. Journal of the ACM
>> (1985).
>> [4] Ricciardi, A. and Birman, K. Using process groups to implement
>> failure detection in asynchronous environments. In Proceedings of the
>> Eleventh ACM Symposium on Principles of Distributed Computing (1991).
>> [5] Schlichting, R.D. and Schneider, F.B. Fail-stop processors: An
>> approach to designing fault-tolerant computing systems. ACM
>> Transactions on Computer Systems (1983).
>>
>> ---------------------------------------------
>>
>> --
>> Joshua Hursey
>> Postdoctoral Research Associate
>> Oak Ridge National Laboratory
>> http://users.nccs.gov/~jjhursey
>> _______________________________________________
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>
>
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>
>



-- 
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey




More information about the mpiwg-ft mailing list