[Mpi3-ft] MPI_ANY_SOURCE
Josh Hursey
jjhursey at open-mpi.org
Thu Oct 13 09:36:13 CDT 2011
yeah you're right. We only need to lock around the wait for the
nonblocking case. One thing that slipped my mind was that MPI_Irecv
will return regardless of the state of other processes - since it is
nonblocking. So eventhough it is ANY_SOURCE, it will still return
MPI_SUCCESS once a request is generated. The request might be marked
as failed, but the call to MPI_Irecv will not return the error, the
subsequent MPI_Wait will return the error.
So it is only really necessary to lock around the Wait.
To tweak your example a bit:
--------------------------
/* Assume the user will cancel the request outside of this function,
* so we do not need to cancel it for them inside the function.
*/
int My_AS_MPI_Wait(MPI_Request *req, MPI_Status *status)
{
while(1) {
reader_lock();
if (my_cnt != recognize_cnt) {
/* New failures were detected */
/* check failed_group and decide if ok to continue */
if (ok_to_continue(req, failed_group) == FALSE) {
reader_unlock();
/* caller responsible for canceling the request */
return MPI_ERR_PROC_FAIL_STOP;
}
my_cnt == recognize_cnt;
}
err = MPI_Wait(req, status);
if (err == MPI_WARN_PROC_FAIL_STOP) {
/* Failure case */
reader_unlock();
writer_lock();
if (my_cnt != recognize_cnt) {
/* another thread has already re-enabled wildcards */
writer_unlock();
continue;
}
MPI_Comm_reenable_any_source(comm, &failed_group);
++recognize_cnt;
writer_unlock();
continue;
} else if (MPI_SUCCESS != err) {
reader_unlock();
/* caller responsible for canceling the request */
return MPI_ERR_PROC_FAIL_STOP;
}
/* Successful completion of the MPI_Wait */
reader_unlock();
return MPI_SUCCESS;
}
}
--------------------------
On Tue, Oct 11, 2011 at 4:16 PM, Darius Buntinas <buntinas at mcs.anl.gov> wrote:
>
> I think you're right Josh, the blocking version wouldn't need to be changed.
>
> For the nonblocking version, wouldn't we only need to lock around the Wait, not between the Recv and Wait? If we're worried about hanging in a blocking Wait, I think we just need to check for all-clients-failed before calling Wait. If anysources are reenabled by another thread before this thread calls Wait, that's OK, so long as the thread checks first.
>
> Here's a function a user could implement to use whenever waiting on an anysource:
>
> int My_AS_MPI_Wait(MPI_Request *req, MPI_Status *status)
> {
> while(1) {
> reader_lock();
> if (my_cnt != recognize_cnt) {
> /* New failures were detected */
> /* check failed_group and decide if ok to continue */
> if (ok_to_continue(req, failed_group) == FALSE) {
> reader_unlock();
> return MPI_ERR_PROC_FAIL_STOP;
> }
> my_cnt == recognize_cnt;
> }
> err = MPI_Wait(req, status);
> if (err == MPI_WARN_PROC_FAIL_STOP) {
> /* Failure case */
> reader_unlock();
> writer_lock();
> if (my_cnt != recognize_cnt) {
> /* another thread has already re-enabled wildcards */
> writer_unlock();
> continue;
> }
> MPI_Comm_reenable_any_source(comm, &failed_group);
> ++recognize_cnt;
> writer_unlock();
> continue;
> } else {
> reader_unlock();
> return MPI_ERR_PROC_FAIL_STOP;
> }
> reader_unlock();
> }
> }
>
> -d
>
>
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>
>
--
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey
More information about the mpiwg-ft
mailing list