[Mpi3-ft] MPI_Comm_validate parameters

Thu Mar 3 11:20:42 CST 2011

On Mar 2, 2011, at 12:11 PM, Darius Buntinas wrote:

> 
> On Mar 2, 2011, at 8:51 AM, Joshua Hursey wrote:
> 
>> So the MPI implementation must keep another list of failed processes known to the MPI implementation (call it "H_i" for Hidden at Process i), but not yet made available in "L_i". So the MPI implementation checks "H_i" to determine if the MPI_Send() should fail. "H_i" represents the set of additional failures not in "G" or "L_i" at some time T for Process i. We can use list projections (similar to how "L_i" can be a physically smaller list than "G") for representing "H_i" to reduce the memory impact, but this does mean that there is slightly more work on the MPI internals side of things.
> 
> The way I would implement it is to represent H_i internally by setting the state of VCs in H_i to FAILED.  Then have lists to represent G and L_i.  For optimization so that we wouldn't have to scan the list of VCs when we do a comm_validate, I would probably also have a list representing L_i-H_i.
> 
>> So the programming practice that we are advocating is that before any get_state() operation that the user call validate_local() - or more precisely synchronize their local/global view of the state of the processes on the communicator. Right?
> 
> Right.  I agree it seems awkward, but I think something is needed to avoid the get_num;malloc;get_state race with threads.  Are there other solutions?

The only other solution would be a mprobe like approach, but that has us introducing yet another object that the user has to manage. So I like that a bit less.

So I think this is fine. Since it is a bit awkward, we should have a brief 'rationale' talking about why we went with this semantic.

-- Josh

> 
> -d
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
> 

------------------------------------
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey