[Mpi3-ft] MPI_Comm_validate parameters

Wed Mar 2 11:11:00 CST 2011

On Mar 2, 2011, at 8:51 AM, Joshua Hursey wrote:

> So the MPI implementation must keep another list of failed processes known to the MPI implementation (call it "H_i" for Hidden at Process i), but not yet made available in "L_i". So the MPI implementation checks "H_i" to determine if the MPI_Send() should fail. "H_i" represents the set of additional failures not in "G" or "L_i" at some time T for Process i. We can use list projections (similar to how "L_i" can be a physically smaller list than "G") for representing "H_i" to reduce the memory impact, but this does mean that there is slightly more work on the MPI internals side of things.

The way I would implement it is to represent H_i internally by setting the state of VCs in H_i to FAILED.  Then have lists to represent G and L_i.  For optimization so that we wouldn't have to scan the list of VCs when we do a comm_validate, I would probably also have a list representing L_i-H_i.

> So the programming practice that we are advocating is that before any get_state() operation that the user call validate_local() - or more precisely synchronize their local/global view of the state of the processes on the communicator. Right?

Right.  I agree it seems awkward, but I think something is needed to avoid the get_num;malloc;get_state race with threads.  Are there other solutions?

-d