[Mpi3-ft] New version of the RTS proposal

Josh Hursey jjhursey at open-mpi.org
Mon Nov 7 12:58:09 CST 2011


I have attached a new version of the RTS proposal that reflects the
feedback from the forum, teleconfs, and mailing list. The change log
is at the bottom of this email. The document can be found at:
  https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/ft/rts_proposal_main
Direct Link:
  https://svn.mpi-forum.org/trac/mpi-forum-web/attachment/wiki/ft/rts_proposal_main/FTWG-Process-FT-Draft-2011-11-07.pdf

Note: I created a new wiki page for us to keep all the RTS related
documents (they were starting to become a bit scattered about).


There are a couple of discussion points that came up in the current
round of editing.
 * We need to review 17.5.6: What if MPI_INIT does not internally
complete successfully due to process failure? The text seems to assume
that MPI_INIT will always be able to complete successfully in the
presence of failure. Maybe we should state that 'if it is able to
complete successfully, then it should even in the presence of
failure'?

* I would like some more folks to look over the process topologies
chapter. In particular I think we need to fix the wording for
MPI_CART_SUB - maybe to better match the wording for mpi_comm_split.

* Window creation currently allows the user to create a window that
cannot be immediately used. Meaning that an unrecognized failed
process in the input communicator can be carried over to the window
group, and cause the window to be invalidated from the start. The user
would then have to either destroy/recreate the window or call
MPI_Win_validate. I believe this was put in for performance concerns,
but it seems cumbersome. Could we say something instead like the
'window should be valid upon successful creation', or would that be
troublesome?


Let me know if you find any problems with the text, or anything that I missed.

Thanks,
Josh

Change Log:
-----------
* 'communication handle' -> 'communication object' to better match
other MPI text.
* Cleaned up the Process Failure Handler section based on recent conversations.
  - Uses the ErrHandler object, and creation/destruction functions
  - Only allows local operations, with the exception of MPI_Comm_drain
  - Implementations may allow other operations.
* Add MPI_COMM_DRAIN and MPI_IComm_drain
* Add MPI_Comm_validate_multiple and MPI_Icomm_validate_multiple
* Added a MPI_GROUP_IGNORE constant for MPI_*_validate(_*)
* Added note to MPI_Icomm_validate that the 'failed' field should not
be accessed until completion of the operation.
* Updated One-Sided semantics to be 'collective' style, per MPI Forum meeting
* Added "The new communicator should be collectively active upon
successful creation." to some of the object creation operations.
'should be' may be problematic.
* Adjusted the wording of the object creation functions to be specific
to process failures.
* A bunch of word-smithing.
* Add new error classes to error list in Chapter 8, and Annex A
* Add MPI_GROUP_NULL to Chap 2, and Annex A.
* Add MPI_FAILHANDLER_NULL to Annex A
* Fix one-sided put, get, accumulate clause.

-- 
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey



More information about the mpiwg-ft mailing list