[Mpi3-ft] notes from FT plenary and WG sessions for Sept. 2011 mpi forum meeting

Howard Pritchard howardp at cray.com
Fri Sep 23 04:53:42 CDT 2011

Hi Folks,

Here is a summary of my notes from both the plenary session and the
working group.   Note that we did the WG before the plenary session.

Here is a summary of my notes from the working group session -

The fault tolerant collectives ideas were discussed.  Darius had
questions about how useful the option for all ranks involved in a coll
op to return the same error code saying the changes to handle this case
vs just calling MPI_Comm_validate after the coll op regardless of the
error code returned.  Everyone thought it would be good to have more
info from apps people about why this feature - returning uniform error
code - would be much better than the other option.

There was a lot of discussion of having a 'vector' version of the
MPI_Comm_validate.  The WG thought it would be sufficient just replace
the existing validate functions with vector versions, with the n=1 case
being equivalent to the existing functionality.

Feedback from the Eurompi was discussed. Darius talked about a point
someone had raised about what happens with MPI_Comm_split, etc. if an
error occurs and the input communicator is still using errors are fatal
error handler.

We also discussed what happens when the system can't start up as many
ranks as the user requested on the mpiexec command line.  Darius pointed
out that the MPI-2 standard already addresses this and said that for
mpich2 now it just tries to start as many ranks as it can.  It was
decided that how this situation is handled should be described in the
"advice to implementers" in the mpiexec section of the spec.

First the plenary session -

Although it is not in the current proposal, Adam Moody brought up the
MPI_Kill functionality again.  That was discussed and many objections
were raised.  It is good that we did remove this from the run through
stabilization proposal.

There was some discussion concerning RMA and how it relates to the
current proposal.  It was agreed that the FT group needs to sync up with
what the RMA group is doing to make sure there aren't any show-stoppers.
 Brian Barrett also brought up the need to have an enumeration of the
cases that occur for the existing and proposed RMA synchronization
models and the run through stabilization proposal.

There was also discussion of the deprecated C++ bindings and how or if
we expect to support fault tolerance when using the C++ bindings given
the way errors are handled when using these bindings.  It was agreed
that places in the proposal currently reading "return an error" need to
be changed to "raised an error".

There was a lively discussion of the MPI_ANY_SOURCE issue and the
functionality required to fix up a communicator for pt2pt when a process
has MPI_ANY_SOURCE receives posted.  Some argued that it may be very
difficult/impossible to support the cancel-like qualities of

George pointed out that we can't just complete-with-error anysource
receives, we'll also need to complete-with-error any receive posted
after the anysource receive that might match the anysource receive.
Consider the example:

Proc0             Proc1
-----             -----
Recv(AS, TAGX)[A]
Recv( 1, TAGX)[B]
                   Send(0, TAGX)[C]
                   Send(0, TAGX)[D]

Without failure, Recv A will match Send C and Recv B will match Send D.
 If an unrelated process fails, and we only complete-with-error the
anysource, then Recv A will complete with error, and Recv B will match
Send C.  So we would need to complete-with-error all recvs posted after
an anysource receive with tags that match the tag of the anysource receive.

This aspect of the proposal definitely needs
reinvestigation/clarification.  It may also be necessary to discuss this
in more detail with those interested in hardware based mpi tag matching.

Torsten was not convinced about the feasibility of implementing
MPI_Comm_validate and friends from a theoretical standpoint.  He asked
if someone has shown whether this is impossible.  The existing work by
Josh concerning lit. etc may need to be reviewed the next time this
proposal is presented to the forum.

There was some discussion of the "vector" version of MPI_Comm_validate.
 The idea of just having a vector of 'comms' didn't seem to go over very
well.  What seemed to be more palatable would be to have a first comm
argument, followed by a vector of comms which are derived from the first

Somehow we returned to looking at examples like the MPI_Bcast example in
the proposal.  The early breakout from the loop caused very animated
discussion, but no real consensus about what to do about this.

Aspects of communicator creation was also brought up.  A suggestion was
given that for routines that create communicators, and for which the
default errors are fatal errors are fatal error handle is associated
with the input comm, that rather than immediately returning an error if
the operation fails, the app would be allowed to attach a non-default
error handler which would then raise the error, much like is documented
for MPI_Init.

Howard Pritchard
Software Engineering
Cray, Inc.

More information about the mpiwg-ft mailing list