[Mpi3-ft] New FT API

Josh Hursey jjhursey at open-mpi.org
Wed Aug 10 16:29:09 CDT 2011

Thanks for getting that new interface going. I have some notes below
for discussion.

-- Josh

better match the MPI_Comm_group() signature?

- Should we add an optional MPI_Info parameter to the
MPI_Comm_get_failed() operation to allow for implementation specific
optimizations - similar to what we had with the 'mask' in the previous

- What should we do for intercommunicators?
 A) Should we expand the signature of MPI_Comm_get_failed to return
both the failed set for both the local and remote groups?
MPI_Comm_get_failed(comm, local_grp, remote_grp)
 B) Should MPI_Comm_get_failed only return the remote group for
intercommunicators, and force the user to MPI_Comm_group() to get the
local group then call a MPI_Group_get_failed() to get the subset of
 C) Add a MPI_Comm_get_failed_remote(comm, grp) that would return the
failures in the remote group, and MPI_Comm_get_failed() would only
ever return the failures in the local list.
 D) Something else?

- MPI_ANY_SOURCE: I think the user should have to pass in a group
containing the list of failed ranks that it is allowing to participate
even though they are failed. If there are other failures on the
communicator that are not contained in this list then the
MPI_ANY_SOURCE will fail as before. This protects the user from
acknowledging more processes than it expects to, and avoids the thread
safety issue mentioned on the wiki.

- MPI_ANY_SOURCE: I do not have a preference on MPI_Comm_recognize
versus MPI_Comm_enable_any_source. A 'recognized' rank can be defined
as a rank that the application has acknowledged as failed to MPI, and
understands that it will not participate in any group operations like
MPI_ANY_SOURCE or collectives (though a special recognition operation
is provided for collectives).

- Nullify: I like keeping this separate since there is a question of
whether or not it is useful to provide MPI_PROC_NULL semantics for P2P
operations to failed peers. I think the signatures are fine. Maybe
change them to MPI_Comm_group_nullified/MPI_Comm_group_nullify to line
up with MPI_Comm_group - though I could see users getting those mixed
up pretty easily.

- MPI_Comm_validate: For this operation, are the processes identified
in the group 'recognized' for MPI_ANY_SOURCE? In the previous proposal
the collective validate would 'recognize' the failed processes
automatically. I would say that in this version we should -not- do
this. I do not see much benefit in this given the flexibility of the
new interface. Just a point of discussion, since it would be different
semantics than those in the previous proposal.

- Thread safety: If we force the user to specify a group of processes
that it wants to recognize [MPI_Comm_recognize(comm, input_group,
output_group)]. The input_group would allow the user to specify those
failed processes that it is wanting to acknowledge. The output_group
would represent the full set of recognized ranks for this
communicator. By requiring the user to specify an input_group this
prevents the MPI implementation from adding more failed processes to
the recognized group without the users knowledge. The user would then
need to make sure that the threads know about how they are each
managing the recognition status.

- Thread safety: There is another question of what happens if:
  ThreadA: MPI_Recv(comm, MPI_ANY_SOURCE)
  --- Rank X fails ---
  ThreadB: Notice a failure of Rank X
  ThreadB: MPI_Comm_recognize(comm, {rankX})
There is a race between when the error of Rank X failure is reported
to ThreadA, and when ThreadB recognizes the failure. If ThreadB
recognizes the failure before ThreadA is put on the run queue, should
ThreadA return an error? or should it keep processing? I think it
should return an error, and we should discourage the users from such
constructs, but I could be convinced otherwise.

- 'notion of thread-specific state in the MPI standard?' From what I
could find, I do not think there is a notion of thread specific state
in the MPI standard. There is a concept of the 'main thread', but I
think that is as far as the standard goes in this regard.

On Wed, Aug 10, 2011 at 4:42 PM, Darius Buntinas <buntinas at mcs.anl.gov> wrote:
> I've started a wiki page describing a new API based on feedback from the forum and comments during our last meeting.  It's still a work in progress, but please look over it and send me your comments, specifically on the "thread safety" section.
> https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/ft/run_through_stabilization_2
> Thanks,
> -d
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft

Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory

More information about the mpiwg-ft mailing list