[Mpi3-ft] MPI_Comm_validate parameters
jjhursey at open-mpi.org
Thu Mar 3 11:20:42 CST 2011
On Mar 2, 2011, at 12:11 PM, Darius Buntinas wrote:
> On Mar 2, 2011, at 8:51 AM, Joshua Hursey wrote:
>> So the MPI implementation must keep another list of failed processes known to the MPI implementation (call it "H_i" for Hidden at Process i), but not yet made available in "L_i". So the MPI implementation checks "H_i" to determine if the MPI_Send() should fail. "H_i" represents the set of additional failures not in "G" or "L_i" at some time T for Process i. We can use list projections (similar to how "L_i" can be a physically smaller list than "G") for representing "H_i" to reduce the memory impact, but this does mean that there is slightly more work on the MPI internals side of things.
> The way I would implement it is to represent H_i internally by setting the state of VCs in H_i to FAILED. Then have lists to represent G and L_i. For optimization so that we wouldn't have to scan the list of VCs when we do a comm_validate, I would probably also have a list representing L_i-H_i.
>> So the programming practice that we are advocating is that before any get_state() operation that the user call validate_local() - or more precisely synchronize their local/global view of the state of the processes on the communicator. Right?
> Right. I agree it seems awkward, but I think something is needed to avoid the get_num;malloc;get_state race with threads. Are there other solutions?
The only other solution would be a mprobe like approach, but that has us introducing yet another object that the user has to manage. So I like that a bit less.
So I think this is fine. Since it is a bit awkward, we should have a brief 'rationale' talking about why we went with this semantic.
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
Postdoctoral Research Associate
Oak Ridge National Laboratory
More information about the mpiwg-ft