[Mpi3-ft] MPI_ANY_SOURCE

Thu Oct 13 09:36:13 CDT 2011

yeah you're right. We only need to lock around the wait for the
nonblocking case. One thing that slipped my mind was that MPI_Irecv
will return regardless of the state of other processes - since it is
nonblocking. So eventhough it is ANY_SOURCE, it will still return
MPI_SUCCESS once a request is generated. The request might be marked
as failed, but the call to MPI_Irecv will not return the error, the
subsequent MPI_Wait will return the error.

So it is only really necessary to lock around the Wait.

To tweak your example a bit:
--------------------------
/* Assume the user will cancel the request outside of this function,
 * so we do not need to cancel it for them inside the function.
 */
int My_AS_MPI_Wait(MPI_Request *req, MPI_Status *status)
{
   while(1) {
       reader_lock();
       if (my_cnt != recognize_cnt) {
           /* New failures were detected */
           /* check failed_group and decide if ok to continue */
           if (ok_to_continue(req, failed_group) == FALSE) {
               reader_unlock();
               /* caller responsible for canceling the request */
               return MPI_ERR_PROC_FAIL_STOP;
           }
           my_cnt == recognize_cnt;
       }
       err = MPI_Wait(req, status);
       if (err == MPI_WARN_PROC_FAIL_STOP) {
           /* Failure case */
           reader_unlock();
           writer_lock();
           if (my_cnt != recognize_cnt) {
               /* another thread has already re-enabled wildcards */
               writer_unlock();
               continue;
           }
           MPI_Comm_reenable_any_source(comm, &failed_group);
           ++recognize_cnt;
           writer_unlock();
           continue;
       } else if (MPI_SUCCESS != err) {
           reader_unlock();
           /* caller responsible for canceling the request */
           return MPI_ERR_PROC_FAIL_STOP;
       }

       /* Successful completion of the MPI_Wait */
       reader_unlock();
       return MPI_SUCCESS;
   }
}
--------------------------

On Tue, Oct 11, 2011 at 4:16 PM, Darius Buntinas <buntinas at mcs.anl.gov> wrote:
>
> I think you're right Josh, the blocking version wouldn't need to be changed.
>
> For the nonblocking version, wouldn't we only need to lock around the Wait, not between the Recv and Wait?  If we're worried about hanging in a blocking Wait, I think we just need to check for all-clients-failed before calling Wait.  If anysources are reenabled by another thread before this thread calls Wait, that's OK, so long as the thread checks first.
>
> Here's a function a user could implement to use whenever waiting on an anysource:
>
> int My_AS_MPI_Wait(MPI_Request *req, MPI_Status *status)
> {
>    while(1) {
>        reader_lock();
>        if (my_cnt != recognize_cnt) {
>            /* New failures were detected */
>            /* check failed_group and decide if ok to continue */
>            if (ok_to_continue(req, failed_group) == FALSE) {
>                reader_unlock();
>                return MPI_ERR_PROC_FAIL_STOP;
>            }
>            my_cnt == recognize_cnt;
>        }
>        err = MPI_Wait(req, status);
>        if (err == MPI_WARN_PROC_FAIL_STOP) {
>            /* Failure case */
>            reader_unlock();
>            writer_lock();
>            if (my_cnt != recognize_cnt) {
>                /* another thread has already re-enabled wildcards */
>                writer_unlock();
>                continue;
>            }
>            MPI_Comm_reenable_any_source(comm, &failed_group);
>            ++recognize_cnt;
>            writer_unlock();
>            continue;
>        } else {
>            reader_unlock();
>            return MPI_ERR_PROC_FAIL_STOP;
>        }
>        reader_unlock();
>    }
> }
>
> -d
>
>
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>
>

-- 
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey