[Mpi3-ft] RTS proposal update in MPI Forum SVN

Josh Hursey jjhursey at open-mpi.org
Mon Aug 29 16:23:07 CDT 2011

I updated the RTS proposal in SVN
(trunk/working-groups/mpi-3/ft/trunk). I pulled out the
errhandler_compare and mpi_comm_kill functions into separate tickets.
I also did some of the smallish changes suggested by the MPI Forum in
July. Full change log at bottom.

I think the PDF is ready to be updated with the new state management
functionality. There are some other edits that we need to do, but
until the new state management functionality is in place it does not
make much sense to do.

Let's wait for the teleconf on Wed. before adding to the RTS proposal.
Then we can make a game plan for updates. That is unless someone is
really eager to go for it before the teleconf (post to the list if

-- Josh

Change Log:
   * https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/291
 * Separate Ticket: MPI_COMM_KILL
   * https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/292

 * 8.2: In table with MPI_ERR_RANK_FAIL_STOP, change 'is failed' to 'has failed'
 * 8.3: "in the associated communicator' does not account for error
handlers on windows or file handles. So change to something like 'in
the associated communication handle'
 * 8.3: use 'associated' instead of 'specified'

 * 8.7: Rank 0 and MPI_Finalize. Moved advice to users to FT chapter.
Left a forward reference in this chapter. Addressed some of the
concerns with this wording.

 * 17.1: Clarify 'Such applications' in the second paragraph. Should
be 'Process fault tolerant applications'
 * 17.2: 'alive process' remove 'normal' or define as 'not failed'
 * 17.3: 'Rationale' replace 'to deadlock situations' with 'to
deadlocks' in last sentence.
 * 17.3: 'Advice to implementors' replace 'where able' with 'if able'
or 'if possible' in last sentence.
 * 17.4.1: Typo on page 539 line 43 - "indicates the state of process
in the associated..." -> "indicates the state of the process the
 * 17.7: Paragraph 1, sentence 1 conflicts with the last sentence of
paragraph 2. This should be fixed.
 * 17.7: It was mentioned that it would be useful to explicitly
mention that a point-to-point operations should not hang in the
presence of errors, that they will eventually return with either
success or some error.

Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory

More information about the mpiwg-ft mailing list