[Mpi3-ft] A few notes from UTK about the RTS proposal

Josh Hursey jjhursey at open-mpi.org
Wed Dec 7 09:54:33 CST 2011

(Sorry for the lag, I got pulled away by another deadline)

On Mon, Dec 5, 2011 at 4:50 PM, Darius Buntinas <buntinas at mcs.anl.gov> wrote:
> Re-reading what I wrote, I think I wasn't clear.  I was referring to the suggestion that reenable_anysource was not needed because the user was already notified of the failure when the first IRecv(ANYSOURCE) returned an error.  I don't think that's sufficient when dealing with threads (see my example).
> If I misunderstood what was suggested at the meeting, then ignore what I said.
> I do agree with Rich that both should return an error.  And any subsequent anysources should as well until reenable_anysource (or whatever we name it) gets called.
> Is there another way to deal with the deadlock?  We can, of course, punt and leave it up to the user as I suggested later.
> -d


I think this is a good example of when we would need the
reenable_anysource function. If I may, just to illustrate your point,
consider the following scenario:

Thread A          Thread B
 *** Process X fails ****
 -> Error
 -> Check(Err)

There is a race between when the error is returned to Thread A, and
when Thread B posts the blocking MPI_Recv(AS).

In the model proposed by UTK, the MPI_Recv(AS) in Thread B would be in
a separate epoch since the error was returned to the user already. So
this receive would block, even if the user did not want it to. The
user would have to lock around the MPI_Recv(AS) to allow only one
thread to call it at a time to avoid this race condition.

In the RTS model, the MPI_Recv(AS) in Thread B would return an error
until the user reenables the ANY_SOURCE receives. So the user does not
need to lock around the MPI_Recv(), but only around the
reenable_anysource (per the example in the RTS proposal).

Maybe this would be a good example to include as support for a rationale?

-- Josh

> On Dec 5, 2011, at 3:20 PM, Graham, Richard L. wrote:
>>>> 17.6: MPI_ANY_SOURCE
>>>> -------------------------
>>>> Problem: The model is too complex and can be simplified.
>>>> A blocking MPI_Recv(ANY) will return an error when a new failure
>>>> occurs. A nonblocking MPI_Irecv(ANY) request, during the
>>>> MPI_Wait/Test, will return a 'warning' (MPI_ERR_ANY_SOURCE_DISABLED)
>>>> and not complete when a new failure occurs.
>>>> There was a debate if the user needs to call another function (like
>>>> MPI_Comm_reenable_any_source) to reactivate the ANY_SOURCE receives,
>>>> or if the fact that the error was returned to the user is sufficient
>>>> to reactivate ANY_SOURCE. The ones that returned a 'warning' error can
>>>> be matched/completed if while the user is handling the error a
>>>> matching send arrives. So this kind of just works like an interrupt.
>>>> Additionally, once an error is returned should we remove the
>>>> protection that "Any new MPI_ANY_SOURCE receive operations using a
>>>> communicator with MPI_ANY_SOURCE receive operations disabled will not
>>>> be posted, and marked with an error code of the class
>>>> MPI_ERR_PROC_FAIL_STOP." So new MPI_Recv(ANY) would post successfully.
>>>> The sentiment is that the user is notified of the process failure via
>>>> the return code from the MPI_Wait/Test, and if they do another
>>>> MPI_Wait/Test or post a new ANY_SOURCE receive they implicitly
>>>> acknowledge the failed process. To avoid a race condition where
>>>> another process fails between the error return and the MPI_Wait/Test,
>>>> the library needs to associate an 'epoch' value with the communicator
>>>> to make sure that the next MPI_Wait/Test returns another error.
>>>> I like the idea of the user doing an explicit second operation, like
>>>> MPI_Comm_reenable_any_source, since then we know that the user is
>>>> aware of the failed group (and seems more MPI-like to me). However, I
>>>> like the more flexible semantics. So I'm a bit split on this one.
>>> There's still a race with threads:  A thread posts a blocking AS receive.  The receive completes with an error because some process in the comm failed.  Another thread posts a blocking AS receive immediately after the receive on the first thread returns.
>>> Even though, from the MPI library's point of view, the application has been "informed" of the failure when the first receive returned with an error, there wasn't enough time for the application to react to the error and prevent the second thread from posting the receive.
>> [rich] I expect the user would get an error in both cases.  Is there a problem with this ?  As long as the user does not get a response that would have it get into a deadlock state, I think we are OK.  If we say that we are in an error state and are not, this is OK, the user can "fix" this.  If we enter into into the call and think we are not in an error state, but really are, I think we are also OK, if we keep track of epochs.
>> Rich
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft

Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory

More information about the mpiwg-ft mailing list