[Mpi3-ft] MPI Fault Tolerance scenarios

Greg Bronevetsky bronevetsky1 at llnl.gov
Tue Mar 3 12:52:14 CST 2009


>Right. The low-level interface may be optional.

That would be an interesting choice for the API: provide several 
levels of support and for each level provide an optional lower-level 
API that can be used to control recovery more finely than would be 
possible using the default error handlers. I think that we'll need an 
explicit communicator rejoin API even when using built-in error 
handlers but at least we won't force users to manually check error 
codes. Having this double stack leaves me worried that the low-level 
API will simply get dropped because the rest of the forum will not 
see the need for something that complicated. However, I think we'll 
have a few very persuasive arguments such as the fact that checking 
for failures immediately after every collective will have a huge 
performance hit, whereas giving control to the user will make it efficient.

Greg Bronevetsky
Post-Doctoral Researcher
1028 Building 451
Lawrence Livermore National Lab
(925) 424-5756
bronevetsky1 at llnl.gov
http://greg.bronevetsky.com 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20090303/531c6613/attachment-0001.html>


More information about the mpiwg-ft mailing list