[Mpi3-ft] radical idea?
Darius Buntinas
buntinas at mcs.anl.gov
Tue Jul 19 15:07:18 CDT 2011
Here are some ideas on how we can change the interface based on the feedback we got. I haven't thought too deeply on all the implications of these, but it's a starting point for discussion.
-d
Let's not call (local) VALIDATE VALIDATE and, let's split out UP/DOWN from RANK_NULLification. Then to query for UP/DOWN, use a handle to an explicit "state" object (as opposed to an implicit "snapshot"), then query that. e.g.:
MPI_COMM_GET_STATE(comm, state_handle)
IN: MPI_COMM comm
OUT: MPI_PROC_STATE state_handle
and ditto for GROUP, FILE, WIN as necessary
MPI_GET_PROC_STATE_SIZE(state_handle, mask, size)
IN: MPI_PROC_STATE state_handle
IN: int mask
OUT: int size
MPI_GET_PROC_STATE_LIST(state_handle, mask, list)
IN: MPI_PROC_STATE state_handle
IN: int mask
OUT: int list[]
MPI_GET_PROC_STATE_NEW(state_handle1, state_handle2, state_handle_new)
IN: MPI_PROC_STATE state_handle1
IN: MPI_PROC_STATE state_handle2
OUT: MPI_PROC_STATE state_handle_new
This gives newly failed processes in state_handle2 since state_handle1.
This addresses issues people had with different threads calling VALIDATE and resetting the "new" flag for other threads.
We can then have a MPI_COMM_NULLIFY() function (or whatever we decide to call it), that would effectively set the rank to MPI_RANK_NULL:
MPI_COMM_NULLIFY(comm, rank)
IN: MPI_COMM comm
IN: int rank
MPI_COMM_NULLIFY_STATE(comm, mask, state_handle)
This sets all ranks described by mask and state_handle to PROC_NULL
MPI_COMM_NULLIFY_GROUP(comm, group)
Set all procs in group to PROC_NULL in comm. Same as logically doing:
foreach p in group
MPI_COMM_NULLIFY(comm, rank-of-p-in-comm)
The operations would be idempotent, and can be called on either live or failed processes. Note, a process can be (UP or DOWN) x (NORMAL or NULL).
VALIDATE_ALL can be renamed to VALIDATE. It returns a state_handle that can be queried for failed processes. Then we can describe it as having the effect of deciding on a common set of failed processes across the comm, setting state_handle to that set, and calling MPI_COMM_NULLIFY() on each failed process in that set.
MPI_COMM_VALIDATE(comm, state_handle)
IN: MPI_COMM comm
OUT: MPI_PROC_STATE state_handle
We may also want a function to "link" one comm's PROC_NULLified state with another, so that if comm_A and comm_B are linked, calling MPI_COMM_NULLIFY on comm_A also NULLifies it on comm_B. We can have a restriction that comm_B is a subset of comm_A or vv.
More information about the mpiwg-ft
mailing list