[Mpi3-ft] Clarification Ticket 323: MPI_ANY_SOURCE and MPI_Test(any)

Fri Mar 16 17:15:45 CDT 2012

Since the 2nd paragraph seems to be covering MPI_ANY_SOURCE as a special 
case I doubt that a reader would attempt to draw many conclusions about 
MPI_ANY_SOURCE from the first paragraph.  If they did, then the first 
sentence could easily be taken as an indication that blocking 
MPI_ANY_SOURCE receives will not return an error due to a single process 
failure because that failure is not guaranteed to prevent the MPI 
implementation from successfully completing the communication.   We might 
intend it to mean that a blocking recv on MPI_ANY_SOURCE returns an error 
when a process fails, but I don't think the text is clear about that. The 
old text wasn't terribly clear either, but the catchall "In all other 
cases, the operation must return MPI_ERR_PROC_FAILED"  sort of covered it.
Dave

From:   Josh Hursey <jjhursey at open-mpi.org>
To:     "MPI 3.0 Fault Tolerance and Dynamic Process Control working 
Group" <mpi3-ft at lists.mpi-forum.org>
Date:   03/16/2012 01:44 PM
Subject:        Re: [Mpi3-ft] Clarification Ticket 323: MPI_ANY_SOURCE and 
MPI_Test(any)
Sent by:        mpi3-ft-bounces at lists.mpi-forum.org

Blocking MPI_ANY_SOURCE receives are covered by the previous paragraph
in the proposal:
 "Future point-to-point communication with the same process on this
communicator must also return MPI_ERR_PROC_FAILED."

The paragraph that follows after that is just a clarification for a
nonblocking receive of MPI_ANY_SOURCE. So I do not think that is an
issue.

I was mostly trying to figure out if my interpretation of the
MPI_Test* functionality was in sync with how others interpreted the
text. I think that my interpretation is correct based on my reading of
the standard.

I do not know if these semantics are a problem for users as long as
they are aware of them (that the MPI_Test* functions will not return
an error if the nonblocking MPI_ANY_SOURCE receive operation is
'pending'). Since the pending operation can still be matched and
completed at some point in the future without the need to clear the
error (via failure_ack). I suppose that it seems odd that MPI_Test*
would behave this way, and users should be aware. If the program just
polls on MPI_Test for completion and never uses MPI_Wait, then they
will not get notification of failures that may affect their
completion, so they will need to be aware of this situation and find a
way to work around it. Seems icky though ('icky' in the technical
sense, of course).

UTK can you comment on your interpretation of this?

-- Josh

On Fri, Mar 16, 2012 at 12:06 PM, David Solt <dsolt at us.ibm.com> wrote:
> I believe I agree with everything you wrote.    I also believe that our
> previous draft stated what happens for a blocking MPI_ANY_SOURCE recv 
("In
> all other cases, the operation must return MPI_ERR_PROC_FAILED"), but 
during
> our rework of that text we no longer state what happens when a blocking
> MPI_ANY_SOURCE recv sees a failure.
>
> I'd like to avoid making many changes between our last reading and the 
first
> vote because late changes don't inspire confidence, but I think Josh's 
issue
> is valid.
>
> Personally I was always in favor of blocking and non-blocking recv on
> MPI_ANY_SOURCE failing if any process fails.   The recv can complete as
> failed with the status pointing to the failed process.   The user can 
still
> call MPI_Comm_failure_ack to exclude failed ranks from triggering 
further
> failures in MPI_ANY_SOURCE.   I don't see the value in MPI_ERR_PENDING.
>  Reposting the recv is not a big deal and we don't care that much about
> performance in the failure case.   Still, I'd prefer any change that can
> address Josh's issue with the least change to the proposal.
>
> Thanks,
> Dave
>
>
>
> From:        Josh Hursey <jjhursey at open-mpi.org>
> To:        "MPI 3.0 Fault Tolerance and Dynamic Process Control working
> Group" <mpi3-ft at lists.mpi-forum.org>
> Date:        03/16/2012 08:54 AM
> Subject:        [Mpi3-ft] Clarification Ticket 323: MPI_ANY_SOURCE and
> MPI_Test(any)
> Sent by:        mpi3-ft-bounces at lists.mpi-forum.org
> ________________________________
>
>
>
> I believe my reasoning is correct below, but thought I would ask the
> group to confirm.
>
> Consider the following code snippet:
> ---------------------
> MPI_Irecv(..., MPI_ANY_SOURCE, ..., &req);
> /* Some other process in the communicator fails */
> MPI_Test(&req, &flag, &status);
> ---------------------
>
> The proposal in #323 says that the request should be marked as
> MPI_ERR_PENDING and not complete. So what should the value of 'flag'
> and 'status' be when returning from MPI_Test?
>
> According to the standard, 'flag = true' indicates two things:
> 1) the operation is completed
> 2) The 'status' object is set
>
> For the MPI_ANY_SOURCE case above, the operation is -not- completed,
> so (1) is violated; therefore I think MPI_Test should set 'flag' equal
> to 'false'. However, is the 'status' also not set? Should MPI_Test
> return MPI_SUCCESS or MPI_ERR_PENDING?
>
> If MPI_Test is to return MPI_ERR_PENDING directly, then there is no
> needed to inspect 'status'. However if we replace MPI_Test with
> MPI_Testany(1, &req, &index, &flag, &status) then the operation would
> return MPI_ERR_IN_STATUS, and the user must inspect the 'status' field
> for the true error value. So we would still set 'flag = false', but
> would also need to set the 'status'. That is if we want MPI_Test*
> return an error code that indicates that the request as 'failed, but
> not completed'.
>
> According to the standard, if no operation is completed then
> MPI_Testany "returns flag = false, returns a value of MPI_UNDEFINED in
> index and status is undefined." So according to the MPI_Testany logic,
> in this case 'flag = false', 'status is undefined', and the operation
> should return MPI_SUCCESS. Is that the expected behavior for the code
> snippet above?
>
> I think so, but I thought I would double check with the group.
>
> This means that the user can only 'see' the MPI_ERR_PENDING state of
> the request when they call an MPI_Wait* operation, which might not be
> what they would normally want to do (because they do not want to
> block).
>
> -- Josh
>
> --
> Joshua Hursey
> Postdoctoral Research Associate
> Oak Ridge National Laboratory
> http://users.nccs.gov/~jjhursey
>
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>
>
>
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft

-- 
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey

_______________________________________________
mpi3-ft mailing list
mpi3-ft at lists.mpi-forum.org
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20120316/bdfc51c4/attachment-0001.html>