[Mpi3-ft] MPI_Recv + MPI_Comm_failure_ack

Wesley Bland wbland at icl.utk.edu
Fri Mar 15 12:37:33 CDT 2013


I think you are correct in your evaluation, though I also think that wasn't
our intent. I think the intent (unless I'm forgetting a discussion) was to
allow MPI_ERR_PENDING to be returned by MPI_RECV and let
MPI_COMM_FAILURE_ACK cover both cases. Can anyone else confirm that this
was the goal.

If that's the case, it's something we'll need to fix in the text.

Thanks,
Wesley


On Fri, Mar 15, 2013 at 12:32 PM, David Solt <dsolt at us.ibm.com> wrote:

> Based on the proposal:
>
> MPI_Comm_failure_ack(blah, blah)
>
> This local operation gives the users a way to acknowledge all locally
> noticed failures on
> comm. After the call, unmatched MPI_ANY_SOURCE receptions that would have
> raised an
> error code MPI_ERR_PENDING due to process failure (see Section 17.2.2)
> proceed without
> further reporting of errors due to those acknowledged failures.
>
> I think this clearly indicates that MPI_Recv is uninfluenced by calls to
> MPI_Comm_failure_ack.  Therefore, there is no way to call
> MPI_Recv(MPI_ANY_SOURCE) and ignore failures reported by
> MPI_Comm_failure_ack.
>
> I believe the following code will NOT work (i.e. after the first failure,
> the MPI_Recv will continuously fail):
>
>
> MPI_Comm_size(intercomm, &size);
> while (failures < size) {
>         err = MPI_Recv(blah, blah, MPI_ANY_SOURCE, intercomm, &status);
>         if (err == MPI_PROC_FAILED) {
>                 MPI_Comm_failure_ack(intercomm);
>                 MPI_Comm_failure_get_acked(intercomm, &group);
>                 MPI_Group_size(group, &failures);
>         } else {
>                 /* process received data */
>         }
> }
>
> and has to be written as:
>
> MPI_Comm_size(intercomm, &size);
> while (failures < size) {
>
>         if (request == MPI_REQUEST_NULL) {
>                 err = MPI_Irecv(blah, blah, MPI_ANY_SOURCE, intercomm,
> &request);
>         }
>         err = MPI_Wait(&request, &status);
>
>         if (err == MPI_ERR_PENDING) {
>                 MPI_Comm_failure_ack(intercomm);
>                 MPI_Comm_failure_get_acked(intercomm, &group);
>                 MPI_Group_size(group, &failures);
>         } else {
>                 /* process received data */
>         }
> }
>
> Am I correct in my thinking?
> If so, was there a reason why MPI_Recv could not also "obey"
> MPI_Comm_failure_ack calls?
>
> Thanks,
> Dave
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20130315/193d4df6/attachment-0001.html>


More information about the mpiwg-ft mailing list