[Mpi3-ft] Eventually Perfect Failure Detector

Josh Hursey jjhursey at open-mpi.org
Thu Oct 13 09:21:55 CDT 2011


We still have the part about excluding/terminating the mistakenly
identified process. We say that the process is excluded from MPI, and
the MPI should try to terminate it (we cannot require termination
since some environment cannot support it).

In the advice to users we have a line stating:
-----------
Additionally, if a process was mistakenly reported as failed it is
possible that for some period of time a subset of processes interact
with the process normally, while others see it as failed. Eventually
all processes in the MPI universe will become aware of the process
failure.
------------

Do you think we should extend this a bit to include a warning about
the process participating via a communication mechanism outside of
MPI? Can you suggest some wording?

Thanks,
Josh

On Wed, Oct 12, 2011 at 8:19 PM, Darius Buntinas <buntinas at mcs.anl.gov> wrote:
> Hi Adam,
>
> I believe we have the part about killing a transient process, but I'm not sure whether the part about external interaction with the process.  We should make sure that's in there.
>
> -d
>
>
> On Oct 12, 2011, at 7:12 PM, Adam T. Moody wrote:
>
>> Hi Josh,
>> At one time, I think we had text also suggesting to implementors to try to kill a transient process that is marked as failed.  And I think we had a note to warn users that even though a process is marked as failed w/ respect to MPI, it may not actually be dead, and applications should expect that the process may be interacting with the job through external mechanisms, e.g., a common file system or sockets.  Are these items still listed somewhere?
>> -Adam
>>
>> Aurélien Bouteiller wrote:
>>
>>> Josh,
>>> I think this is a much better position to consider eventually perfect. I'm happy to see that these discussions we had have been fruitful.
>>> Aurelien
>>> Le 1 sept. 2011 ą 11:34, Josh Hursey a écrit :
>>>
>>>
>>>> So we have been working under a modified definition of a 'Perfect
>>>> Failure Detector' for a while. Recently we have been encouraged to
>>>> revisit and clarify this assumption and definition (1) to make it
>>>> clearer without the academic jargon (2) to be more precise as far as
>>>> the detector we actually semantically require.
>>>>
>>>> It should be noted that a truly Perfect Detector can only be
>>>> implemented in a synchronous system, which makes it an unsafe
>>>> assumption for the MPI standard. In the RTS proposal we weakened the
>>>> definition slightly to handle transient failures (an therefor a nod to
>>>> partially synchronous systems), but we still talk about it as a
>>>> Perfect Detector which has been getting us in trouble.
>>>>
>>>> After a few conversations with folks and some additional (re-)reading
>>>> I started to reformulate the text in the RTS proposal. At bottom I
>>>> included the new text that I am suggesting to replace the 'Perfect
>>>> Detector' language in the RTS proposal.
>>>>
>>>> At its core is the movement to what is called an 'Eventually Perfect'
>>>> failure detector with a terminate-upon-mistake constraint for
>>>> mistakenly identified failed processes (previously we called them
>>>> transitent failures). So to the local process the failure detector
>>>> seems perfect, but in the system as a whole it is only eventually so.
>>>>
>>>> The front part of the text below is meant to be the clearer language,
>>>> and I pushed the more precise stuff into the rationale.
>>>>
>>>> Take a read through the text below and let me know what you think. I
>>>> suspect it will require some iteration to get just right, but
>>>> hopefully it starts us out on better footing going forward.
>>>>
>>>> Thanks,
>>>> Josh
>>>>
>>>>
>>>> ---------------------------------------------
>>>> 17.2 MPI Terms and Conventions
>>>>
>>>> Fail-Stop process failure is one in which the process stops executing,
>>>> and its internal state is lost [5].
>>>>
>>>> 17.3 Detection of Process Failure
>>>>
>>>> MPI will provide the ability to detect process failures. MPI will
>>>> guarantee that eventually all alive processes will be able to know
>>>> about the failure. The state management and query operations defined
>>>> in this chapter allow the application to query for the failed set of
>>>> processes in a communication group. Additional semantics regarding
>>>> communication involving failed processes are defined later in this
>>>> chapter.
>>>>
>>>> It is possible that MPI mistakenly identifies a process as failed when
>>>> it is not failed. In this situation the MPI library will exclude the
>>>> mistakenly identified failed process from the MPI universe, and
>>>> eventually all alive processes will see this process as failed.
>>>>
>>>> ------------------
>>>> Rationale.
>>>> MPI provides an eventually perfect failure detector for fail-stop
>>>> process failures [1]. An eventually perfect failure detector is both
>>>> strongly complete and eventually strongly accurate.
>>>>
>>>> Strong completeness is defined as: "Eventually every process that
>>>> crashes is permanently suspected by every correct process" [1]. In
>>>> essence this means that eventually every failed process will be able
>>>> to be known to all alive processes. Without strong completeness some
>>>> alive process that depends on a failed process is not guaranteed to
>>>> ever receive an error, which can lead to a deadlock.
>>>>
>>>> Eventual strong accuracy is defined as: "There is a time after which
>>>> correct processes are not suspected by any correct process" [1].
>>>> Depending on the system architecture, it may be impossible to
>>>> correctly determine if a process is failed or slow [3]. Eventual
>>>> strong accuracy allows for unreliable failure detectors that may
>>>> mistakenly suspect a process as failed when it is not failed [1].
>>>>
>>>> If a process failure was reported to the application and the process
>>>> is later found to be alive then MPI will exclude the process from the
>>>> MPI universe. Resolving the mistake by excluding the process from the
>>>> MPI universe is similar to the technique used by the group membership
>>>> protocol in [4]. This additional constraint allows for consistent
>>>> reporting of error states to the local process. Without this
>>>> constraint the application would not be able to trust the MPI
>>>> implementation when it reports process failure errors. Once an alive
>>>> process receives notification of a failed peer process, then it may
>>>> continue under the assumption that the process is failed.
>>>>
>>>> End of rationale.
>>>> ------------------
>>>>
>>>> ------------------
>>>> Advice to users.
>>>> The strong completeness condition of the failure detector allows the
>>>> MPI implementation some flexibility in managing the performance costs
>>>> involved with process failure detection and notification. As such, it
>>>> is possible that some alive processes in the MPI universe know of
>>>> process failures that other alive processes do not for a period of
>>>> time. Additionally, if a process was mistakenly reported as failed it
>>>> is possible that for some period of time a subset of processes
>>>> interact with the process normally, while others see it as failed.
>>>> Eventually all processes in the MPI universe will be able to be aware
>>>> of the process failure.
>>>>
>>>> End of advice to users.
>>>> ------------------
>>>>
>>>> ------------------
>>>> Advice to implementors.
>>>> An MPI implementation may choose to provide a stronger failure
>>>> detector (i.e., perfect failure detector), but is not required to do
>>>> so. This may be possible for MPI implementations targeted at
>>>> synchronous systems [2].
>>>>
>>>> End of advice to implementors.
>>>> ------------------
>>>>
>>>> Citations:
>>>> ----------
>>>> [1] Chandra, T. and Toueg, S. Unreliable Failure Detectors for
>>>> Reliable Distributed Systems. Journal of the ACM (1996).
>>>> [2] Dwork, C., Lynch, N., and Stockmeyer, L. Consensus in the Presence
>>>> of Partial Synchrony. Journal of the ACM (1988).
>>>> [3] Fischer, M., Lynch, N., and Paterson, M. Impossibility of
>>>> distributed consensus with one faulty process. Journal of the ACM
>>>> (1985).
>>>> [4] Ricciardi, A. and Birman, K. Using process groups to implement
>>>> failure detection in asynchronous environments. In Proceedings of the
>>>> Eleventh ACM Symposium on Principles of Distributed Computing (1991).
>>>> [5] Schlichting, R.D. and Schneider, F.B. Fail-stop processors: An
>>>> approach to designing fault-tolerant computing systems. ACM
>>>> Transactions on Computer Systems (1983).
>>>>
>>>> ---------------------------------------------
>>>>
>>>> --
>>>> Joshua Hursey
>>>> Postdoctoral Research Associate
>>>> Oak Ridge National Laboratory
>>>> http://users.nccs.gov/~jjhursey
>>>> _______________________________________________
>>>> mpi3-ft mailing list
>>>> mpi3-ft at lists.mpi-forum.org
>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>>>
>>>
>>>
>>> _______________________________________________
>>> mpi3-ft mailing list
>>> mpi3-ft at lists.mpi-forum.org
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>>
>>
>> _______________________________________________
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>
>
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>
>



-- 
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey




More information about the mpiwg-ft mailing list