[Mpi3-ft] Stabilization Proposal Updated & MPI_COMM_VALIDATE_ALL_SYNC

Joshua Hursey jjhursey at open-mpi.org
Tue Sep 28 08:35:14 CDT 2010

I updated the Run-Through Stabilization proposal:
 * Cross reference the MPI_ERR_CANNOT_CONTINUE proposal
 * Fix the change for MPI_COMM_SPLIT per a conversation with Jeff Squyres
 * Add a MPI_COMM_VALIDATE_ALL_SYNC function (more below)



This function will likely replace the existing MPI_COMM_VALIDATE_ALL and MPI_COMM_VALIDATE_ALL_CLEAR as the only collective validation function (and probably be renamed to just MPI_COMM_VALIDATE_ALL). This call is collective over the group/communicator and clears all known failures at the top of the call. It returns a count of the total number of failures (previously recognized and unrecognized) in the group/communicator. The user can use the local MPI_COMM_VALIDATE function to access a list of failures, if needed.

This function is similar to the original proposal's validate function, but adds the count argument as an agreed upon value. After the discussion in Germany, and experimenting with a few application kernels it became apparent that the MPI_COMM_VALIDATE_ALL and MPI_COMM_VALIDATE_ALL_CLEAR functions are often always called together when a failure happens (incurring two collective calls for each communicator). So, since there is a local accessor to the list, we can create a single collective function that 'fixes' the group/communicator in a single operation. Removing the list of know failures from this collective also reduces the memory footprint needed to call this function.

A few questions for the group:
1) So are there any objections to removing the MPI_COMM_VALIDATE_ALL and MPI_COMM_VALIDATE_ALL_CLEAR functions and replacing them with a the MPI_COMM_VALIDATE_ALL_SYNC function (and renaming it MPI_COMM_VALIDATE_ALL)?

2) Group management operations are local and do not require interprocess communication (Section 6.3). In light of this is there any objection to removing the collective validate functions from the group construction section (they will still be defined for communicators)?

As always, thanks for the feedback.

-- Josh

Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory

More information about the mpiwg-ft mailing list