I'd like to start the discussion on the proposal for "defining
communicator state with process loss", or recoverability of MPI state
in the presence of process loss.
To start the discussion I am suggesting that folks have a look at the
following paper since I think it relates to quite a number of related
topics:
Fagg, G., et. al. "Extending the MPI Specification for Process Fault
Tolerance on High Performance Computing Systems"
http://icl.cs.utk.edu/projectsfiles/ftmpi/pubs/isc2004-FT-MPI.pdf
----
Josh Hursey
Graduate Student, Indiana University
|