[Mpi3-ft] Eventually Perfect Failure Detector

Darius Buntinas buntinas at mcs.anl.gov
Wed Oct 12 19:19:29 CDT 2011


Hi Adam,

I believe we have the part about killing a transient process, but I'm not sure whether the part about external interaction with the process.  We should make sure that's in there.

-d


On Oct 12, 2011, at 7:12 PM, Adam T. Moody wrote:

> Hi Josh,
> At one time, I think we had text also suggesting to implementors to try to kill a transient process that is marked as failed.  And I think we had a note to warn users that even though a process is marked as failed w/ respect to MPI, it may not actually be dead, and applications should expect that the process may be interacting with the job through external mechanisms, e.g., a common file system or sockets.  Are these items still listed somewhere?
> -Adam
> 
> Aurélien Bouteiller wrote:
> 
>> Josh, 
>> I think this is a much better position to consider eventually perfect. I'm happy to see that these discussions we had have been fruitful. 
>> Aurelien 
>> Le 1 sept. 2011 à 11:34, Josh Hursey a écrit :
>> 
>> 
>>> So we have been working under a modified definition of a 'Perfect
>>> Failure Detector' for a while. Recently we have been encouraged to
>>> revisit and clarify this assumption and definition (1) to make it
>>> clearer without the academic jargon (2) to be more precise as far as
>>> the detector we actually semantically require.
>>> 
>>> It should be noted that a truly Perfect Detector can only be
>>> implemented in a synchronous system, which makes it an unsafe
>>> assumption for the MPI standard. In the RTS proposal we weakened the
>>> definition slightly to handle transient failures (an therefor a nod to
>>> partially synchronous systems), but we still talk about it as a
>>> Perfect Detector which has been getting us in trouble.
>>> 
>>> After a few conversations with folks and some additional (re-)reading
>>> I started to reformulate the text in the RTS proposal. At bottom I
>>> included the new text that I am suggesting to replace the 'Perfect
>>> Detector' language in the RTS proposal.
>>> 
>>> At its core is the movement to what is called an 'Eventually Perfect'
>>> failure detector with a terminate-upon-mistake constraint for
>>> mistakenly identified failed processes (previously we called them
>>> transitent failures). So to the local process the failure detector
>>> seems perfect, but in the system as a whole it is only eventually so.
>>> 
>>> The front part of the text below is meant to be the clearer language,
>>> and I pushed the more precise stuff into the rationale.
>>> 
>>> Take a read through the text below and let me know what you think. I
>>> suspect it will require some iteration to get just right, but
>>> hopefully it starts us out on better footing going forward.
>>> 
>>> Thanks,
>>> Josh
>>> 
>>> 
>>> ---------------------------------------------
>>> 17.2 MPI Terms and Conventions
>>> 
>>> Fail-Stop process failure is one in which the process stops executing,
>>> and its internal state is lost [5].
>>> 
>>> 17.3 Detection of Process Failure
>>> 
>>> MPI will provide the ability to detect process failures. MPI will
>>> guarantee that eventually all alive processes will be able to know
>>> about the failure. The state management and query operations defined
>>> in this chapter allow the application to query for the failed set of
>>> processes in a communication group. Additional semantics regarding
>>> communication involving failed processes are defined later in this
>>> chapter.
>>> 
>>> It is possible that MPI mistakenly identifies a process as failed when
>>> it is not failed. In this situation the MPI library will exclude the
>>> mistakenly identified failed process from the MPI universe, and
>>> eventually all alive processes will see this process as failed.
>>> 
>>> ------------------
>>> Rationale.
>>> MPI provides an eventually perfect failure detector for fail-stop
>>> process failures [1]. An eventually perfect failure detector is both
>>> strongly complete and eventually strongly accurate.
>>> 
>>> Strong completeness is defined as: "Eventually every process that
>>> crashes is permanently suspected by every correct process" [1]. In
>>> essence this means that eventually every failed process will be able
>>> to be known to all alive processes. Without strong completeness some
>>> alive process that depends on a failed process is not guaranteed to
>>> ever receive an error, which can lead to a deadlock.
>>> 
>>> Eventual strong accuracy is defined as: "There is a time after which
>>> correct processes are not suspected by any correct process" [1].
>>> Depending on the system architecture, it may be impossible to
>>> correctly determine if a process is failed or slow [3]. Eventual
>>> strong accuracy allows for unreliable failure detectors that may
>>> mistakenly suspect a process as failed when it is not failed [1].
>>> 
>>> If a process failure was reported to the application and the process
>>> is later found to be alive then MPI will exclude the process from the
>>> MPI universe. Resolving the mistake by excluding the process from the
>>> MPI universe is similar to the technique used by the group membership
>>> protocol in [4]. This additional constraint allows for consistent
>>> reporting of error states to the local process. Without this
>>> constraint the application would not be able to trust the MPI
>>> implementation when it reports process failure errors. Once an alive
>>> process receives notification of a failed peer process, then it may
>>> continue under the assumption that the process is failed.
>>> 
>>> End of rationale.
>>> ------------------
>>> 
>>> ------------------
>>> Advice to users.
>>> The strong completeness condition of the failure detector allows the
>>> MPI implementation some flexibility in managing the performance costs
>>> involved with process failure detection and notification. As such, it
>>> is possible that some alive processes in the MPI universe know of
>>> process failures that other alive processes do not for a period of
>>> time. Additionally, if a process was mistakenly reported as failed it
>>> is possible that for some period of time a subset of processes
>>> interact with the process normally, while others see it as failed.
>>> Eventually all processes in the MPI universe will be able to be aware
>>> of the process failure.
>>> 
>>> End of advice to users.
>>> ------------------
>>> 
>>> ------------------
>>> Advice to implementors.
>>> An MPI implementation may choose to provide a stronger failure
>>> detector (i.e., perfect failure detector), but is not required to do
>>> so. This may be possible for MPI implementations targeted at
>>> synchronous systems [2].
>>> 
>>> End of advice to implementors.
>>> ------------------
>>> 
>>> Citations:
>>> ----------
>>> [1] Chandra, T. and Toueg, S. Unreliable Failure Detectors for
>>> Reliable Distributed Systems. Journal of the ACM (1996).
>>> [2] Dwork, C., Lynch, N., and Stockmeyer, L. Consensus in the Presence
>>> of Partial Synchrony. Journal of the ACM (1988).
>>> [3] Fischer, M., Lynch, N., and Paterson, M. Impossibility of
>>> distributed consensus with one faulty process. Journal of the ACM
>>> (1985).
>>> [4] Ricciardi, A. and Birman, K. Using process groups to implement
>>> failure detection in asynchronous environments. In Proceedings of the
>>> Eleventh ACM Symposium on Principles of Distributed Computing (1991).
>>> [5] Schlichting, R.D. and Schneider, F.B. Fail-stop processors: An
>>> approach to designing fault-tolerant computing systems. ACM
>>> Transactions on Computer Systems (1983).
>>> 
>>> ---------------------------------------------
>>> 
>>> -- 
>>> Joshua Hursey
>>> Postdoctoral Research Associate
>>> Oak Ridge National Laboratory
>>> http://users.nccs.gov/~jjhursey
>>> _______________________________________________
>>> mpi3-ft mailing list
>>> mpi3-ft at lists.mpi-forum.org
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>>   
>> 
>> 
>> _______________________________________________
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>> 
> 
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft





More information about the mpiwg-ft mailing list