[Mpi3-ft] run through stabilization user-guide
toon.knapen at gmail.com
Wed Feb 9 13:42:09 CST 2011
On Wed, Feb 9, 2011 at 4:22 PM, Bronevetsky, Greg <bronevetsky1 at llnl.gov>wrote:
> If the workers use communicators that are MPI_ERRORS_FATAL, if there is a
> disconnect with the master, they will be automatically aborted. Meanwhile,
> the master will be informed about their “failure” because of the disconnect
> and when connection to the physical nodes that previously hosted the aborted
> workers is re-established, the master’s MPI library will see that worker
> tasks are dead and will not need to kill the master.
>From the user guide I did not understand that there is this kind of
'interoperability' between the different error handlers. For instance the
user guide says 'The application must opt-in to the fault tolerance
semantics by replacing the default error handler'.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the mpiwg-ft