[Mpi3-ft] Asynchronous error handling

Mon May 26 10:51:26 CDT 2008

>  On the telecon today we agreed to have our next telecon on 6/6 focus on how
>we may handle asynchronous error notification within MPI.  The working
>assumption is that we will still have return error codes, but also make use
>of asynchronous notification.  We need to
>    - Clearly define the boundary between these two different error
>notification mechanisms, i.e., when we use one and when the other
>    - Define the precise mechanism for asynchronous error notification
>This e-mail is intended to jump start discussion in preparation for the next
>telecon.

I'll throw something out here. I wasn't around for the initial 
discussions, so some of this may fly in the face of something that 
people have already decided is obviously wrong. Either way, its a 
start. You may commence with the tomato throwing.

The idea for this proposal is a publish-subscribe model where the 
spec defines the default publish-subscribe relations but allows MPI 
implementations to define new events and default and allows 
applications to cancel/add new event subscriptions. I like this model 
mostly because it is the one being used by the CIFTS project, which I 
suspect will have an important role to play in MPI application fault 
tolerance. Since we won't be able to list all the possible errors 
that may occur, we'll need to define the possible error types and 
describe describe the error notification properties of these broad 
types, rather than individual events. Implementations may then put 
each real error into any type that is deemed appropriate.

Every error will have a defined detection set, which is the set of 
processes that by default subscribe to being notified of this event. 
For example, if a given process fails, any process that tries to 
receive a message from this process is definitely within its 
detection radius. However, if the failed process is a receiver in a 
broadcast, we may or may not choose to include the other broadcast 
receivers in the detection radius (probably not). Each process is 
subscribed to all error events that happen in the process, as long as 
the errors don't cause the process itself to fail.

For each failure event type we will define the latest point in time 
when each process within the event's detection set will be notified. 
For example, if process p fails, all other processes must be notified 
no later than their next receive call that must receive from p (i.e. 
receives with MPI_ANY_SOURCE don't qualify). For errors that cause 
process state to be corrupted, we may want to inform other processes 
no later than the first point in time when their state becomes 
dependent on the corruption. The MPI implementation may deliver the 
event at this latest point using the synchronous error API or at any 
earlier point in time using the asynchronous API.

The synchronous API will be a direct extension of the current error 
reporting API. The asynchronous API will take the form of an events 
queue that may be explicitly polled by the application to see if 
there are any pending events. Applications will also be able to 
register a callback function that will automatically be called by MPI 
whenever a new event arrives. Furthermore, processes may subscribe to 
events emanating from other processes as they see fit. For example, 
the application may designate one or more processes as error monitors 
and these processes would register themselves to listen to all other 
processes and take appropriate corrective measures if something goes wrong.

Greg Bronevetsky
Post-Doctoral Researcher
1028 Building 451
Lawrence Livermore National Lab
(925) 424-5756
bronevetsky1 at llnl.gov