[Mpi3-ft] Eventually Perfect Failure Detector

Wed Oct 12 19:12:52 CDT 2011

Hi Josh,
At one time, I think we had text also suggesting to implementors to try 
to kill a transient process that is marked as failed.  And I think we 
had a note to warn users that even though a process is marked as failed 
w/ respect to MPI, it may not actually be dead, and applications should 
expect that the process may be interacting with the job through external 
mechanisms, e.g., a common file system or sockets.  Are these items 
still listed somewhere?
-Adam

Aurélien Bouteiller wrote:

>Josh, 
>
>I think this is a much better position to consider eventually perfect. I'm happy to see that these discussions we had have been fruitful. 
>
>Aurelien 
>
>Le 1 sept. 2011 à 11:34, Josh Hursey a écrit :
>
>  
>
>>So we have been working under a modified definition of a 'Perfect
>>Failure Detector' for a while. Recently we have been encouraged to
>>revisit and clarify this assumption and definition (1) to make it
>>clearer without the academic jargon (2) to be more precise as far as
>>the detector we actually semantically require.
>>
>>It should be noted that a truly Perfect Detector can only be
>>implemented in a synchronous system, which makes it an unsafe
>>assumption for the MPI standard. In the RTS proposal we weakened the
>>definition slightly to handle transient failures (an therefor a nod to
>>partially synchronous systems), but we still talk about it as a
>>Perfect Detector which has been getting us in trouble.
>>
>>After a few conversations with folks and some additional (re-)reading
>>I started to reformulate the text in the RTS proposal. At bottom I
>>included the new text that I am suggesting to replace the 'Perfect
>>Detector' language in the RTS proposal.
>>
>>At its core is the movement to what is called an 'Eventually Perfect'
>>failure detector with a terminate-upon-mistake constraint for
>>mistakenly identified failed processes (previously we called them
>>transitent failures). So to the local process the failure detector
>>seems perfect, but in the system as a whole it is only eventually so.
>>
>>The front part of the text below is meant to be the clearer language,
>>and I pushed the more precise stuff into the rationale.
>>
>>Take a read through the text below and let me know what you think. I
>>suspect it will require some iteration to get just right, but
>>hopefully it starts us out on better footing going forward.
>>
>>Thanks,
>>Josh
>>
>>
>>---------------------------------------------
>>17.2 MPI Terms and Conventions
>>
>>Fail-Stop process failure is one in which the process stops executing,
>>and its internal state is lost [5].
>>
>>17.3 Detection of Process Failure
>>
>>MPI will provide the ability to detect process failures. MPI will
>>guarantee that eventually all alive processes will be able to know
>>about the failure. The state management and query operations defined
>>in this chapter allow the application to query for the failed set of
>>processes in a communication group. Additional semantics regarding
>>communication involving failed processes are defined later in this
>>chapter.
>>
>>It is possible that MPI mistakenly identifies a process as failed when
>>it is not failed. In this situation the MPI library will exclude the
>>mistakenly identified failed process from the MPI universe, and
>>eventually all alive processes will see this process as failed.
>>
>>------------------
>>Rationale.
>>MPI provides an eventually perfect failure detector for fail-stop
>>process failures [1]. An eventually perfect failure detector is both
>>strongly complete and eventually strongly accurate.
>>
>>Strong completeness is defined as: "Eventually every process that
>>crashes is permanently suspected by every correct process" [1]. In
>>essence this means that eventually every failed process will be able
>>to be known to all alive processes. Without strong completeness some
>>alive process that depends on a failed process is not guaranteed to
>>ever receive an error, which can lead to a deadlock.
>>
>>Eventual strong accuracy is defined as: "There is a time after which
>>correct processes are not suspected by any correct process" [1].
>>Depending on the system architecture, it may be impossible to
>>correctly determine if a process is failed or slow [3]. Eventual
>>strong accuracy allows for unreliable failure detectors that may
>>mistakenly suspect a process as failed when it is not failed [1].
>>
>>If a process failure was reported to the application and the process
>>is later found to be alive then MPI will exclude the process from the
>>MPI universe. Resolving the mistake by excluding the process from the
>>MPI universe is similar to the technique used by the group membership
>>protocol in [4]. This additional constraint allows for consistent
>>reporting of error states to the local process. Without this
>>constraint the application would not be able to trust the MPI
>>implementation when it reports process failure errors. Once an alive
>>process receives notification of a failed peer process, then it may
>>continue under the assumption that the process is failed.
>>
>>End of rationale.
>>------------------
>>
>>------------------
>>Advice to users.
>>The strong completeness condition of the failure detector allows the
>>MPI implementation some flexibility in managing the performance costs
>>involved with process failure detection and notification. As such, it
>>is possible that some alive processes in the MPI universe know of
>>process failures that other alive processes do not for a period of
>>time. Additionally, if a process was mistakenly reported as failed it
>>is possible that for some period of time a subset of processes
>>interact with the process normally, while others see it as failed.
>>Eventually all processes in the MPI universe will be able to be aware
>>of the process failure.
>>
>>End of advice to users.
>>------------------
>>
>>------------------
>>Advice to implementors.
>>An MPI implementation may choose to provide a stronger failure
>>detector (i.e., perfect failure detector), but is not required to do
>>so. This may be possible for MPI implementations targeted at
>>synchronous systems [2].
>>
>>End of advice to implementors.
>>------------------
>>
>>Citations:
>>----------
>>[1] Chandra, T. and Toueg, S. Unreliable Failure Detectors for
>>Reliable Distributed Systems. Journal of the ACM (1996).
>>[2] Dwork, C., Lynch, N., and Stockmeyer, L. Consensus in the Presence
>>of Partial Synchrony. Journal of the ACM (1988).
>>[3] Fischer, M., Lynch, N., and Paterson, M. Impossibility of
>>distributed consensus with one faulty process. Journal of the ACM
>>(1985).
>>[4] Ricciardi, A. and Birman, K. Using process groups to implement
>>failure detection in asynchronous environments. In Proceedings of the
>>Eleventh ACM Symposium on Principles of Distributed Computing (1991).
>>[5] Schlichting, R.D. and Schneider, F.B. Fail-stop processors: An
>>approach to designing fault-tolerant computing systems. ACM
>>Transactions on Computer Systems (1983).
>>
>>---------------------------------------------
>>
>>-- 
>>Joshua Hursey
>>Postdoctoral Research Associate
>>Oak Ridge National Laboratory
>>http://users.nccs.gov/~jjhursey
>>_______________________________________________
>>mpi3-ft mailing list
>>mpi3-ft at lists.mpi-forum.org
>>http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>    
>>
>
>
>_______________________________________________
>mpi3-ft mailing list
>mpi3-ft at lists.mpi-forum.org
>http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>  
>