[Mpi3-ft] run through stabilization user-guide

Toon Knapen toon.knapen at gmail.com
Wed Feb 9 13:42:09 CST 2011


On Wed, Feb 9, 2011 at 4:22 PM, Bronevetsky, Greg <bronevetsky1 at llnl.gov>wrote:

>
>
> If the workers use communicators that are MPI_ERRORS_FATAL, if there is a
> disconnect with the master, they will be automatically aborted. Meanwhile,
> the master will be informed about their “failure” because of the disconnect
> and when connection to the physical nodes that previously hosted the aborted
> workers is re-established, the master’s MPI library will see that worker
> tasks are dead and will not need to kill the master.
>
>
>From the user guide I did not understand that there is this kind of
'interoperability' between the different error handlers. For instance the
user guide says 'The application must opt-in to the fault tolerance
semantics by replacing the default error handler'.

toon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20110209/7033a269/attachment-0001.html>


More information about the mpiwg-ft mailing list