[Mpi3-ft] Clarification Ticket 323: MPI_ANY_SOURCE and MPI_Test(any)

Josh Hursey jjhursey at open-mpi.org
Mon Mar 19 08:44:15 CDT 2012


Just to make this point clear, maybe we can add the following sentence
to the end of the MPI_ANY_SOURCE paragraph:
  In contrast to the nonblocking case, a blocking receive from
MPI_ANY_SOURCE will complete and raise an error in the class of
MPI_ERR_PROC_FAILED when a process failure occurs in the associated
communicator, and not raise an error in the class of MPI_ERR_PENDING
in this situation.

I agree that, since the point is somewhat muddled in the discussion,
we should explicitly call it out in the case in the paragraph. What do
you all think of the wording above?

I think this would be a good errata item for 3.1.

-- Josh

On Fri, Mar 16, 2012 at 6:15 PM, David Solt <dsolt at us.ibm.com> wrote:
> Since the 2nd paragraph seems to be covering MPI_ANY_SOURCE as a special
> case I doubt that a reader would attempt to draw many conclusions about
> MPI_ANY_SOURCE from the first paragraph.  If they did, then the first
> sentence could easily be taken as an indication that blocking MPI_ANY_SOURCE
> receives will not return an error due to a single process failure because
> that failure is not guaranteed to prevent the MPI implementation from
> successfully completing the communication.   We might intend it to mean that
> a blocking recv on MPI_ANY_SOURCE returns an error when a process fails, but
> I don't think the text is clear about that.   The old text wasn't terribly
> clear either, but the catchall "In all other cases, the operation must
> return MPI_ERR_PROC_FAILED"  sort of covered it.
> Dave
>
>
>
> From:        Josh Hursey <jjhursey at open-mpi.org>
> To:        "MPI 3.0 Fault Tolerance and Dynamic Process Control working
> Group" <mpi3-ft at lists.mpi-forum.org>
> Date:        03/16/2012 01:44 PM
> Subject:        Re: [Mpi3-ft] Clarification Ticket 323: MPI_ANY_SOURCE and
>      MPI_Test(any)
> Sent by:        mpi3-ft-bounces at lists.mpi-forum.org
> ________________________________
>
>
>
> Blocking MPI_ANY_SOURCE receives are covered by the previous paragraph
> in the proposal:
> "Future point-to-point communication with the same process on this
> communicator must also return MPI_ERR_PROC_FAILED."
>
> The paragraph that follows after that is just a clarification for a
> nonblocking receive of MPI_ANY_SOURCE. So I do not think that is an
> issue.
>
> I was mostly trying to figure out if my interpretation of the
> MPI_Test* functionality was in sync with how others interpreted the
> text. I think that my interpretation is correct based on my reading of
> the standard.
>
> I do not know if these semantics are a problem for users as long as
> they are aware of them (that the MPI_Test* functions will not return
> an error if the nonblocking MPI_ANY_SOURCE receive operation is
> 'pending'). Since the pending operation can still be matched and
> completed at some point in the future without the need to clear the
> error (via failure_ack). I suppose that it seems odd that MPI_Test*
> would behave this way, and users should be aware. If the program just
> polls on MPI_Test for completion and never uses MPI_Wait, then they
> will not get notification of failures that may affect their
> completion, so they will need to be aware of this situation and find a
> way to work around it. Seems icky though ('icky' in the technical
> sense, of course).
>
> UTK can you comment on your interpretation of this?
>
> -- Josh
>
>
> On Fri, Mar 16, 2012 at 12:06 PM, David Solt <dsolt at us.ibm.com> wrote:
>> I believe I agree with everything you wrote.    I also believe that our
>> previous draft stated what happens for a blocking MPI_ANY_SOURCE recv ("In
>> all other cases, the operation must return MPI_ERR_PROC_FAILED"), but
>> during
>> our rework of that text we no longer state what happens when a blocking
>> MPI_ANY_SOURCE recv sees a failure.
>>
>> I'd like to avoid making many changes between our last reading and the
>> first
>> vote because late changes don't inspire confidence, but I think Josh's
>> issue
>> is valid.
>>
>> Personally I was always in favor of blocking and non-blocking recv on
>> MPI_ANY_SOURCE failing if any process fails.   The recv can complete as
>> failed with the status pointing to the failed process.   The user can
>> still
>> call MPI_Comm_failure_ack to exclude failed ranks from triggering further
>> failures in MPI_ANY_SOURCE.   I don't see the value in MPI_ERR_PENDING.
>>  Reposting the recv is not a big deal and we don't care that much about
>> performance in the failure case.   Still, I'd prefer any change that can
>> address Josh's issue with the least change to the proposal.
>>
>> Thanks,
>> Dave
>>
>>
>>
>> From:        Josh Hursey <jjhursey at open-mpi.org>
>> To:        "MPI 3.0 Fault Tolerance and Dynamic Process Control working
>> Group" <mpi3-ft at lists.mpi-forum.org>
>> Date:        03/16/2012 08:54 AM
>> Subject:        [Mpi3-ft] Clarification Ticket 323: MPI_ANY_SOURCE and
>> MPI_Test(any)
>> Sent by:        mpi3-ft-bounces at lists.mpi-forum.org
>> ________________________________
>>
>>
>>
>> I believe my reasoning is correct below, but thought I would ask the
>> group to confirm.
>>
>> Consider the following code snippet:
>> ---------------------
>> MPI_Irecv(..., MPI_ANY_SOURCE, ..., &req);
>> /* Some other process in the communicator fails */
>> MPI_Test(&req, &flag, &status);
>> ---------------------
>>
>> The proposal in #323 says that the request should be marked as
>> MPI_ERR_PENDING and not complete. So what should the value of 'flag'
>> and 'status' be when returning from MPI_Test?
>>
>> According to the standard, 'flag = true' indicates two things:
>> 1) the operation is completed
>> 2) The 'status' object is set
>>
>> For the MPI_ANY_SOURCE case above, the operation is -not- completed,
>> so (1) is violated; therefore I think MPI_Test should set 'flag' equal
>> to 'false'. However, is the 'status' also not set? Should MPI_Test
>> return MPI_SUCCESS or MPI_ERR_PENDING?
>>
>> If MPI_Test is to return MPI_ERR_PENDING directly, then there is no
>> needed to inspect 'status'. However if we replace MPI_Test with
>> MPI_Testany(1, &req, &index, &flag, &status) then the operation would
>> return MPI_ERR_IN_STATUS, and the user must inspect the 'status' field
>> for the true error value. So we would still set 'flag = false', but
>> would also need to set the 'status'. That is if we want MPI_Test*
>> return an error code that indicates that the request as 'failed, but
>> not completed'.
>>
>> According to the standard, if no operation is completed then
>> MPI_Testany "returns flag = false, returns a value of MPI_UNDEFINED in
>> index and status is undefined." So according to the MPI_Testany logic,
>> in this case 'flag = false', 'status is undefined', and the operation
>> should return MPI_SUCCESS. Is that the expected behavior for the code
>> snippet above?
>>
>> I think so, but I thought I would double check with the group.
>>
>> This means that the user can only 'see' the MPI_ERR_PENDING state of
>> the request when they call an MPI_Wait* operation, which might not be
>> what they would normally want to do (because they do not want to
>> block).
>>
>> -- Josh
>>
>> --
>> Joshua Hursey
>> Postdoctoral Research Associate
>> Oak Ridge National Laboratory
>> http://users.nccs.gov/~jjhursey
>>
>> _______________________________________________
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>
>>
>>
>> _______________________________________________
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>
>
>
> --
> Joshua Hursey
> Postdoctoral Research Associate
> Oak Ridge National Laboratory
> http://users.nccs.gov/~jjhursey
>
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>
>
>
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft



-- 
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey




More information about the mpiwg-ft mailing list