[Mpi3-ft] A few notes from UTK about the RTS proposal

Mon Dec 5 15:50:57 CST 2011

Re-reading what I wrote, I think I wasn't clear.  I was referring to the suggestion that reenable_anysource was not needed because the user was already notified of the failure when the first IRecv(ANYSOURCE) returned an error.  I don't think that's sufficient when dealing with threads (see my example).  

If I misunderstood what was suggested at the meeting, then ignore what I said.

I do agree with Rich that both should return an error.  And any subsequent anysources should as well until reenable_anysource (or whatever we name it) gets called.

Is there another way to deal with the deadlock?  We can, of course, punt and leave it up to the user as I suggested later.

-d

On Dec 5, 2011, at 3:20 PM, Graham, Richard L. wrote:

>>> 17.6: MPI_ANY_SOURCE
>>> -------------------------
>>> Problem: The model is too complex and can be simplified.
>>> 
>>> A blocking MPI_Recv(ANY) will return an error when a new failure
>>> occurs. A nonblocking MPI_Irecv(ANY) request, during the
>>> MPI_Wait/Test, will return a 'warning' (MPI_ERR_ANY_SOURCE_DISABLED)
>>> and not complete when a new failure occurs.
>>> 
>>> There was a debate if the user needs to call another function (like
>>> MPI_Comm_reenable_any_source) to reactivate the ANY_SOURCE receives,
>>> or if the fact that the error was returned to the user is sufficient
>>> to reactivate ANY_SOURCE. The ones that returned a 'warning' error can
>>> be matched/completed if while the user is handling the error a
>>> matching send arrives. So this kind of just works like an interrupt.
>>> 
>>> Additionally, once an error is returned should we remove the
>>> protection that "Any new MPI_ANY_SOURCE receive operations using a
>>> communicator with MPI_ANY_SOURCE receive operations disabled will not
>>> be posted, and marked with an error code of the class
>>> MPI_ERR_PROC_FAIL_STOP." So new MPI_Recv(ANY) would post successfully.
>>> 
>>> The sentiment is that the user is notified of the process failure via
>>> the return code from the MPI_Wait/Test, and if they do another
>>> MPI_Wait/Test or post a new ANY_SOURCE receive they implicitly
>>> acknowledge the failed process. To avoid a race condition where
>>> another process fails between the error return and the MPI_Wait/Test,
>>> the library needs to associate an 'epoch' value with the communicator
>>> to make sure that the next MPI_Wait/Test returns another error.
>>> 
>>> I like the idea of the user doing an explicit second operation, like
>>> MPI_Comm_reenable_any_source, since then we know that the user is
>>> aware of the failed group (and seems more MPI-like to me). However, I
>>> like the more flexible semantics. So I'm a bit split on this one.
>>> 
>> 
>> There's still a race with threads:  A thread posts a blocking AS receive.  The receive completes with an error because some process in the comm failed.  Another thread posts a blocking AS receive immediately after the receive on the first thread returns.
>> 
>> Even though, from the MPI library's point of view, the application has been "informed" of the failure when the first receive returned with an error, there wasn't enough time for the application to react to the error and prevent the second thread from posting the receive.
> 
> [rich] I expect the user would get an error in both cases.  Is there a problem with this ?  As long as the user does not get a response that would have it get into a deadlock state, I think we are OK.  If we say that we are in an error state and are not, this is OK, the user can "fix" this.  If we enter into into the call and think we are not in an error state, but really are, I think we are also OK, if we keep track of epochs.
> 
> Rich