[Mpi3-ft] MPI_Comm_validate() protection

Darius Buntinas buntinas at mcs.anl.gov
Tue Dec 20 10:16:50 CST 2011


Hmm.  There was a reason we added that.

I think this had to do with separating collectives from one "epoch" from those of another, so that no one does anything like this:

P0                     P1
-------------------    -------------------
MPI_Ibcast()
MPI_Comm_validate()    MPI_Comm_validate()
.                      MPI_Ibcast()
MPI_Wait()             MPI_Wait()

Maybe we should say that validate completes-with-error all outstanding collective operations.  But in that case if the user spans a collective across a validate the collective may complete successfully at one process (e.g., root of bcast) and complete-with-an-error on another, but validate() would show that no processes have failed (no process failures would normally indicate that the bcast completed successfully).  I guess we can call this a user error, and just say don't do that.

-d

On Dec 20, 2011, at 9:03 AM, Josh Hursey wrote:

> In 17.7.1 the proposal states:
>  "All collective communication operations initiated before the call
> to MPI_COMM_VALIDATE must also complete before it is called, and no
> collective calls may be initiated until it has completed."
> 
> Considering the case where FailHandlers are used in the 'ALL'
> operating mode. In this mode, a user may want to call validate in all
> of the FailHanlders to synchronize them. But if the FailHandler was
> triggered out of a collective operation over a communicator that does
> -not- include the failed process then the user cannot write a correct
> program (since they cannot cancel, and may not be able to complete the
> collective operation).
> 
> My suggestion is that we remove this sentence. Do folks have a problem
> with this?
> 
> Thanks,
> Josh
> 
> -- 
> Joshua Hursey
> Postdoctoral Research Associate
> Oak Ridge National Laboratory
> http://users.nccs.gov/~jjhursey
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft





More information about the mpiwg-ft mailing list