[Mpi3-ft] MPI_Recv + MPI_Comm_failure_ack
Wesley Bland
wbland at icl.utk.edu
Fri Mar 15 13:01:25 CDT 2013
You're right. A blocking call shouldn't return MPI_ERR_PENDING when there
is no request to be pending. I did think we'd covered this some other way.
It's definitely the intent for both versions of receive to be able to
ignore acknowledged failures.
On Fri, Mar 15, 2013 at 1:53 PM, David Solt <dsolt at us.ibm.com> wrote:
> I'm pretty sure the intent was that MPI_Recv should NOT return
> MPI_ERR_PENDING as there is no request on which the error can be pending,
> but I don't know if much thought was given to allowing MPI_Recv to ignore
> acknowledge ranks.
> Dave
>
>
>
> From: Wesley Bland <wbland at icl.utk.edu>
> To: "MPI 3.0 Fault Tolerance and Dynamic Process Control working
> Group" <mpi3-ft at lists.mpi-forum.org>,
> Date: 03/15/2013 12:45 PM
> Subject: Re: [Mpi3-ft] MPI_Recv + MPI_Comm_failure_ack
> Sent by: mpi3-ft-bounces at lists.mpi-forum.org
> ------------------------------
>
>
>
> I think you are correct in your evaluation, though I also think that
> wasn't our intent. I think the intent (unless I'm forgetting a discussion)
> was to allow MPI_ERR_PENDING to be returned by MPI_RECV and let
> MPI_COMM_FAILURE_ACK cover both cases. Can anyone else confirm that this
> was the goal.
>
> If that's the case, it's something we'll need to fix in the text.
>
> Thanks,
> Wesley
>
>
> On Fri, Mar 15, 2013 at 12:32 PM, David Solt <*dsolt at us.ibm.com*<dsolt at us.ibm.com>>
> wrote:
> Based on the proposal:
>
> MPI_Comm_failure_ack(blah, blah)
>
> This local operation gives the users a way to acknowledge all locally
> noticed failures on
> comm. After the call, unmatched MPI_ANY_SOURCE receptions that would have
> raised an
> error code MPI_ERR_PENDING due to process failure (see Section 17.2.2)
> proceed without
> further reporting of errors due to those acknowledged failures.
>
> I think this clearly indicates that MPI_Recv is uninfluenced by calls to
> MPI_Comm_failure_ack. Therefore, there is no way to call
> MPI_Recv(MPI_ANY_SOURCE) and ignore failures reported by
> MPI_Comm_failure_ack.
>
> I believe the following code will NOT work (i.e. after the first failure,
> the MPI_Recv will continuously fail):
>
>
> MPI_Comm_size(intercomm, &size);
> while (failures < size) {
> err = MPI_Recv(blah, blah, MPI_ANY_SOURCE, intercomm, &status);
> if (err == MPI_PROC_FAILED) {
> MPI_Comm_failure_ack(intercomm);
> MPI_Comm_failure_get_acked(intercomm, &group);
> MPI_Group_size(group, &failures);
> } else {
> /* process received data */
> }
> }
>
> and has to be written as:
>
> MPI_Comm_size(intercomm, &size);
> while (failures < size) {
>
> if (request == MPI_REQUEST_NULL) {
> err = MPI_Irecv(blah, blah, MPI_ANY_SOURCE, intercomm,
> &request);
> }
> err = MPI_Wait(&request, &status);
>
> if (err == MPI_ERR_PENDING) {
> MPI_Comm_failure_ack(intercomm);
> MPI_Comm_failure_get_acked(intercomm, &group);
> MPI_Group_size(group, &failures);
> } else {
> /* process received data */
> }
> }
>
> Am I correct in my thinking?
> If so, was there a reason why MPI_Recv could not also "obey"
> MPI_Comm_failure_ack calls?
>
> Thanks,
> Dave
> _______________________________________________
> mpi3-ft mailing list*
> **mpi3-ft at lists.mpi-forum.org* <mpi3-ft at lists.mpi-forum.org>*
> **http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft*<http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft>
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20130315/a7b1b242/attachment-0001.html>
More information about the mpiwg-ft
mailing list