[Mpi3-ft] Proposal Updates

Joshua Hursey jjhursey at open-mpi.org
Tue Mar 29 10:41:57 CDT 2011


Per our discussion at the MPI Forum meeting on Monday, March 28 I have updated the Run-through stabilization and Process recovery proposals.

Run-through stabilization
 - Added a discussion point about a fault tolerant collective keyword
https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/ft/run_through_stabilization

Process Recovery
 - Added a few discussion points
 - Updated document to reflect the decision that collectives should not be disabled by rejoining processes only by newly failed processes.
 - Added some clarification on when the generation number is advanced.
 - Added an example to help illustrate the multiple restoration discussion point.
https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/ft/process_recovery

One important question to think about is the following discussion point regarding layered libraries.
  https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/ft/process_recovery#discuss_library_support

Let me know what you think.

-- Josh

------------------------------------
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey





More information about the mpiwg-ft mailing list