[Mpi3-ft] MPI_Comm_validate parameters

Joshua Hursey jjhursey at open-mpi.org
Wed Mar 2 08:51:11 CST 2011


On Mar 1, 2011, at 12:45 PM, Darius Buntinas wrote:

> 
> On Mar 1, 2011, at 8:30 AM, Joshua Hursey wrote:
> 
>> One side note that we should make explicit is that "L_i" can be updated in two ways. First, when an application calls MPI_comm_validate_local() to update "L_i" with all locally known failures. Secondly, if a failure is detected during any MPI operation (except the validate accessor functions) it will also update "L_i". So if we do a MPI_Send(rank=2), and rank 2 fails during the send then we want to make sure that if the user asks for the state of rank 2 it is identified as failed and not active. The application is implicitly noticing the update to "L_i" from the return code of the MPI_Send operation.
>> 
>> Do folks see a problem with this additional semantic?
> 
> This could present a problem with get_num_state and get_state in a multithreaded environment.  Of course if one has only one thread per communicator, then it's OK, but is that realistic?  The idea of L_i and G being constant between validate calls was to avoid  races like this.
> 
> The user should understand that the way to get the state of a process is 
>    validate_local();get_state_rank()
> or 
>    validate_local();get_num_state();malloc();get_state()
> And if the process has multiple threads using the same communicator, it can synchronize access to validate_local as appropriate.
> 
> I think the _local functions can be considered a convenience for the user, since the user could keep track of L_i herself using the _global values and noticing failed sends/receives.  So if we look at it that way, the fact that get_state doesn't report rank 2 (in your example above) as having failed immediately after the send, might be OK.

It just seems a bit odd as a semantic for the interfaces. But I understand your point.

So the situation is (in shorthand):
------------------------
validate_local(comm);
get_state_rank(comm, 2, state) /* state=OK; */

/*** 2 fails ***/

ret = MPI_Send(comm, 2); /* Error */
if( ERR_FAIL_STOP == ret ) {
  get_state_rank(comm, 2, state) /* state=OK; */
  validate_local(comm);
  get_state_rank(comm, 2, state) /* state=FAILED; */
}
------------------------

instead of:
------------------------
validate_local(comm);
get_state_rank(comm, 2, state) /* state=OK; */

/*** 2 fails ***/

ret = MPI_Send(comm, 2); /* Error */
if( ERR_FAIL_STOP == ret ) {
  get_state_rank(comm, 2, state) /* state=FAILED; */
}
------------------------

So the MPI implementation must keep another list of failed processes known to the MPI implementation (call it "H_i" for Hidden at Process i), but not yet made available in "L_i". So the MPI implementation checks "H_i" to determine if the MPI_Send() should fail. "H_i" represents the set of additional failures not in "G" or "L_i" at some time T for Process i. We can use list projections (similar to how "L_i" can be a physically smaller list than "G") for representing "H_i" to reduce the memory impact, but this does mean that there is slightly more work on the MPI internals side of things.

So the programming practice that we are advocating is that before any get_state() operation that the user call validate_local() - or more precisely synchronize their local/global view of the state of the processes on the communicator. Right?

-- Josh


> 
> -d
> 
> 
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
> 

------------------------------------
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey





More information about the mpiwg-ft mailing list