[Mpi3-ft] A need to detect any failure

Fri Aug 1 16:14:08 CDT 2008

What we're focusing on right now is the most basic notification 
mechanism that needs to be supported by MPI. This is the notification 
mechanism that will be used with legacy applications that use error 
codes to be informed of errors and need sensible default behavior 
from MPI. Although your use-case makes sense, the extra cost that it 
would incur means that applications that need this capability are 
best off using a more complex/customizable API, such as the 
publish-subscribe API we've been tossing around. Also, the idea of 
passing error notifications on top of existing messages is good 
motivation for the piggybacking API. In any case, I think that this 
is a use-case for the extended notification API, not the default 
notification mechanism.

Greg Bronevetsky
Post-Doctoral Researcher
1028 Building 451
Lawrence Livermore National Lab
(925) 424-5756
bronevetsky1 at llnl.gov

At 11:03 AM 8/1/2008, you wrote:
>Hi all,
>I like Erez's idea to associate errors with call sites.  However, I 
>still believe there is a case where processes do not communicate 
>with each other directly, but will need to know when any process in 
>the system dies.  Here is an example.
>
>Let's say an application splits MPI_COMM_WORLD into a 2d cartesian 
>grid.  Then each process creates a "row" communicator and a "column" 
>communicator.  From here on out, each process only communicates 
>through its row and column communicators.  (I think this is what a 
>number of our codes really do, so this is a very realistic 
>example).  In the case of a failure, assume this application 
>requires a global rollback, and assume that each process writes a 
>checkpoint file at the end of each iteration, which looks like the following:
>
>for (iteration=0;  iteration<numTimesteps;  iteration++) {
>  row_xchange();
>  column_xchange();
>  compute();
>  checkpoint(iteration);
>}
>
>Now consider processes (i, j) and (i+1, j+1) in this 2d cartesion grid.
>Because these two processes are in different rows and columns, they 
>don't share a row or column communicator, and so they never 
>communicate with each other directly.  Now, assume process (i+1, j+1) fails.
>Process (i, j) needs to rollback, but how will it be notified?
>
>One solution would be to force the application to call 
>MPI_Barrier(MPI_COMM_WORLD) inside of its iteration loop.  While 
>this would work, it seems costly, and it defeats all the effort the 
>application team went to in order to make the code scalable by using 
>just row and column communicators.
>
>Another solution may be to rely on daisy chaing.  That is, in the 
>next iteration after process (i+1, j+1) dies, processes (i+1, j) and 
>(i, j+1) may find out since they each share a communicator with the 
>failed process.  Then in the following iteration, processes (i+1, j) 
>and (i, j+1) could propogate this failure message to process (i, j) 
>since they each share a communicator.  This would also work, but a 
>special error code would be needed since the communication with 
>(i+1, j) and (i, j+1) may have succeeded just fine.
>
>In the current standard, MPI handles this type of failure by 
>invoking the error handler on MPI_COMM_WORLD.  This could be yet 
>another solution.
>-Adam
>_______________________________________________
>mpi3-ft mailing list
>mpi3-ft at lists.mpi-forum.org
>http:// lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft