[Mpi3-ft] MPI_Recv + MPI_Comm_failure_ack

David Solt dsolt at us.ibm.com
Fri Mar 15 16:19:45 CDT 2013


Oh good. 

I have another interesting issue:

Rank (0)                                Rank (1) Rank (2)

MPI_Irecv(.., ANYSOURCE, MPI_ANY_TAG, comm=C)
                                        !!!!FAIL!!!!
MPI_Wait(req, status) -- > return MPI_ERR_PENDING
 
 MPI_Send(..., 0, 0, comm=C) 
 
MPI_Recv(..., 2, MPI_ANY_TAG, comm=C);


Should the MPI_Recv return or should it block because the pending Irecv 
could match the MPI_Send from rank 2 and therefore "queues" on it?

I can see arguments for both cases.

Thanks,
Dave






From:   Aurélien Bouteiller <bouteill at icl.utk.edu>
To:     "MPI 3.0 Fault Tolerance and Dynamic Process Control working 
Group" <mpi3-ft at lists.mpi-forum.org>, 
Date:   03/15/2013 01:07 PM
Subject:        Re: [Mpi3-ft] MPI_Recv + MPI_Comm_failure_ack
Sent by:        mpi3-ft-bounces at lists.mpi-forum.org



The intent was to return ERR_PROC_FAILED (nothing is pending) and the Ack 
should also stop this return of PROC_FAILED for blocking ANY_SOURCE. 

Good catch Dave! 

Aurelien

Le 15 mars 2013 à 14:01, Wesley Bland <wbland at icl.utk.edu> a écrit :

> You're right. A blocking call shouldn't return MPI_ERR_PENDING when 
there is no request to be pending. I did think we'd covered this some 
other way. It's definitely the intent for both versions of receive to be 
able to ignore acknowledged failures.
> 
> 
> On Fri, Mar 15, 2013 at 1:53 PM, David Solt <dsolt at us.ibm.com> wrote:
> I'm pretty sure the intent was that MPI_Recv should NOT return 
MPI_ERR_PENDING as there is no request on which the error can be pending, 
but I don't know if much thought was given to allowing MPI_Recv to ignore 
acknowledge ranks. 
> Dave 
> 
> 
> 
> From:        Wesley Bland <wbland at icl.utk.edu> 
> To:        "MPI 3.0 Fault Tolerance and Dynamic Process Control working 
Group" <mpi3-ft at lists.mpi-forum.org>, 
> Date:        03/15/2013 12:45 PM 
> Subject:        Re: [Mpi3-ft] MPI_Recv + MPI_Comm_failure_ack 
> Sent by:        mpi3-ft-bounces at lists.mpi-forum.org 
> 
> 
> 
> I think you are correct in your evaluation, though I also think that 
wasn't our intent. I think the intent (unless I'm forgetting a discussion) 
was to allow MPI_ERR_PENDING to be returned by MPI_RECV and let 
MPI_COMM_FAILURE_ACK cover both cases. Can anyone else confirm that this 
was the goal. 
> 
> If that's the case, it's something we'll need to fix in the text. 
> 
> Thanks, 
> Wesley 
> 
> 
> On Fri, Mar 15, 2013 at 12:32 PM, David Solt <dsolt at us.ibm.com> wrote: 
> Based on the proposal: 
> 
> MPI_Comm_failure_ack(blah, blah) 
> 
> This local operation gives the users a way to acknowledge all locally 
noticed failures on 
> comm. After the call, unmatched MPI_ANY_SOURCE receptions that would 
have raised an 
> error code MPI_ERR_PENDING due to process failure (see Section 17.2.2) 
proceed without 
> further reporting of errors due to those acknowledged failures. 
> 
> I think this clearly indicates that MPI_Recv is uninfluenced by calls to 
MPI_Comm_failure_ack.  Therefore, there is no way to call 
MPI_Recv(MPI_ANY_SOURCE) and ignore failures reported by 
MPI_Comm_failure_ack. 
> 
> I believe the following code will NOT work (i.e. after the first 
failure, the MPI_Recv will continuously fail): 
> 
> 
> MPI_Comm_size(intercomm, &size); 
> while (failures < size) { 
>         err = MPI_Recv(blah, blah, MPI_ANY_SOURCE, intercomm, &status); 
>         if (err == MPI_PROC_FAILED) { 
>                 MPI_Comm_failure_ack(intercomm); 
>                 MPI_Comm_failure_get_acked(intercomm, &group); 
>                 MPI_Group_size(group, &failures); 
>         } else { 
>                 /* process received data */ 
>         } 
> } 
> 
> and has to be written as: 
> 
> MPI_Comm_size(intercomm, &size); 
> while (failures < size) { 
> 
>         if (request == MPI_REQUEST_NULL) { 
>                 err = MPI_Irecv(blah, blah, MPI_ANY_SOURCE, intercomm, 
&request); 
>         } 
>         err = MPI_Wait(&request, &status); 
> 
>         if (err == MPI_ERR_PENDING) { 
>                 MPI_Comm_failure_ack(intercomm); 
>                 MPI_Comm_failure_get_acked(intercomm, &group); 
>                 MPI_Group_size(group, &failures); 
>         } else { 
>                 /* process received data */ 
>         } 
> } 
> 
> Am I correct in my thinking? 
> If so, was there a reason why MPI_Recv could not also "obey" 
MPI_Comm_failure_ack calls? 
> 
> Thanks, 
> Dave
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft 
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft 
> 
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
> 
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft

--
* Dr. Aurélien Bouteiller
* Researcher at Innovative Computing Laboratory
* University of Tennessee
* 1122 Volunteer Boulevard, suite 309b
* Knoxville, TN 37996
* 865 974 9375








_______________________________________________
mpi3-ft mailing list
mpi3-ft at lists.mpi-forum.org
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20130315/78a73204/attachment-0001.html>


More information about the mpiwg-ft mailing list