[Mpi3-ft] Asynchronous error handling
Greg Bronevetsky
bronevetsky1 at llnl.gov
Mon May 26 10:51:26 CDT 2008
> On the telecon today we agreed to have our next telecon on 6/6 focus on how
>we may handle asynchronous error notification within MPI. The working
>assumption is that we will still have return error codes, but also make use
>of asynchronous notification. We need to
> - Clearly define the boundary between these two different error
>notification mechanisms, i.e., when we use one and when the other
> - Define the precise mechanism for asynchronous error notification
>This e-mail is intended to jump start discussion in preparation for the next
>telecon.
I'll throw something out here. I wasn't around for the initial
discussions, so some of this may fly in the face of something that
people have already decided is obviously wrong. Either way, its a
start. You may commence with the tomato throwing.
The idea for this proposal is a publish-subscribe model where the
spec defines the default publish-subscribe relations but allows MPI
implementations to define new events and default and allows
applications to cancel/add new event subscriptions. I like this model
mostly because it is the one being used by the CIFTS project, which I
suspect will have an important role to play in MPI application fault
tolerance. Since we won't be able to list all the possible errors
that may occur, we'll need to define the possible error types and
describe describe the error notification properties of these broad
types, rather than individual events. Implementations may then put
each real error into any type that is deemed appropriate.
Every error will have a defined detection set, which is the set of
processes that by default subscribe to being notified of this event.
For example, if a given process fails, any process that tries to
receive a message from this process is definitely within its
detection radius. However, if the failed process is a receiver in a
broadcast, we may or may not choose to include the other broadcast
receivers in the detection radius (probably not). Each process is
subscribed to all error events that happen in the process, as long as
the errors don't cause the process itself to fail.
For each failure event type we will define the latest point in time
when each process within the event's detection set will be notified.
For example, if process p fails, all other processes must be notified
no later than their next receive call that must receive from p (i.e.
receives with MPI_ANY_SOURCE don't qualify). For errors that cause
process state to be corrupted, we may want to inform other processes
no later than the first point in time when their state becomes
dependent on the corruption. The MPI implementation may deliver the
event at this latest point using the synchronous error API or at any
earlier point in time using the asynchronous API.
The synchronous API will be a direct extension of the current error
reporting API. The asynchronous API will take the form of an events
queue that may be explicitly polled by the application to see if
there are any pending events. Applications will also be able to
register a callback function that will automatically be called by MPI
whenever a new event arrives. Furthermore, processes may subscribe to
events emanating from other processes as they see fit. For example,
the application may designate one or more processes as error monitors
and these processes would register themselves to listen to all other
processes and take appropriate corrective measures if something goes wrong.
Greg Bronevetsky
Post-Doctoral Researcher
1028 Building 451
Lawrence Livermore National Lab
(925) 424-5756
bronevetsky1 at llnl.gov
More information about the mpiwg-ft
mailing list