[Mpi3-ft] Updates to ft chapter
jjhursey at open-mpi.org
Fri Sep 16 14:47:28 CDT 2011
I updated the document with the following changes:
* Minor wording updates, and a few movements - per diff sent to the
list (this thread).
* 17.1: Slight rewording of advice to implementors text (thanks to
Sayantan and Darius)
* 17.4.4: Fixed typo in Example 17.4 (thanks to Sayantan)
* 17.4.4: Combined examples 17.2 and 17.3. (thanks to Sayantan)
I did -not- change any of the bigger items in the list below.
On Thu, Sep 15, 2011 at 1:27 PM, Josh Hursey <jjhursey at open-mpi.org> wrote:
> I made some minor-ish edits. I attached the diff to this email for
> review. Feel free to commit it if you think it is good to go.
> Some larger items that I did not want to change/adjust before
> discussing with the group:
> Does the Advice to Implementors buy us anything? Should it be reworded?
> Do we need the definitions of 'error' and 'failure'? We don't rely on
> these definitions in the text beyond their previously implied
> definitions in the standard. If we do not need them, then it might be
> good to drop them to reduce complexity.
> It was suggested that we try to clarify this paragraph with the
> Rationale. Any suggestions?
> Should we go ahead and pull the Advice to implementors regarding the
> return value of mpiexec into a separate ticket? Or should we keep it
> in the document and pull it if we get pushback? (I think on the call
> we decided the latter, but I forget now).
> In the second Rationale paragraph. I moved the first sentence to
> 17.7.1. But I think we can drop the rest of the rationale. I do not
> know if it is terribly instructive.
> I updated the rational to account for the MPI_Reduce numerical
> stability recommendation.
> Rationale. The MPI_COMM_VALIDATE and MPI_ICOMM_VALIDATE operations
> provide the MPI implementation an opportunity to restructure
> collective communication patterns before the communicator is used by
> the alive process. This may allow for improved collective performance
> after process failure. It should be noted such optimizations might
> change the consistency recommendation for MPI_REDUCE in the advice to
> implementors in Section ??. It is strongly recommended that the
> consistency recommendation hold for MPI_REDUCE between consecutive
> collective activations of a communicator using a collective validation
> operation (e.g, MPI_COMM_VALIDATE). (End of rationale.)
> Note that I moved the Advice to users regarding libraries to here, per
> the teleconf.
> Added back the Advice to users regarding the 'sync-barrier-sync'
> semantic for MPI_File_validate.
> On Wed, Sep 14, 2011 at 1:58 PM, Darius Buntinas <buntinas at mcs.anl.gov> wrote:
>> I've made some changes we discussed on the phone this morning. You can find the latest pdf here (or at the bottom of the "Modified run-through stabilization" page on the wiki):
>> Here's a summary of the changes:
>> * Changed MPI_ERR_RANK_FAIL_STOP to MPI_ERR_PROC_FAIL_STOP (because a "rank" doesn't fail, a "processes" does)
>> * Fixed up usage of rank vs process in the chapter.
>> * Removed MPI_COMM_COLLECTIVES_ENABLED function because it returns local version of a global state which is meaningless for applications.
>> * Added MPI_COMM_ANY_SOURCE_ENABLED
>> * Moved the definition of MPI_COMM_VALIDATE et.al. earlier in the section, and added a new subsection.
>> Please look over my changes, especially how I rearranged the collectives section for the definition of MPI_COMM_VALIDATE, and let me know if they look OK.
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
> Joshua Hursey
> Postdoctoral Research Associate
> Oak Ridge National Laboratory
Postdoctoral Research Associate
Oak Ridge National Laboratory
More information about the mpiwg-ft