[Mpi3-ft] MPI_ANY_SOURCE ... again...

Wed Jan 25 17:06:32 CST 2012

Hi Josh,

Cray is okay with the semantics described in the current
FTWG proposal attached to the ticket.

We plan to just leverage the out-of-band system fault
detector software that currently kills jobs if
a node goes down that the job was running on.

Howard

Josh Hursey wrote:
> We really need to make a decision on semantics for MPI_ANY_SOURCE.
> 
> During the plenary session the MPI Forum had a problem with the current
> proposed semantics. The current proposal states (roughly) that
> MPI_ANY_SOURCE return when a failure emerges in the communicator. The
> MPI Forum read this as a strong requirement for -progress- (something
> the MPI standard tries to stay away from).
> 
> The alternative proposal is that a receive on MPI_ANY_SOURCE will block
> until completed with a message. This means that it will -not- return
> when a new failure has been encountered (even if the calling process is
> the only process left alive in the communicator). This does get around
> the concern about progress, but puts a large burden on the end user.
> 
> 
> There are a couple good use cases for MPI_ANY_SOURCE (grumble, grumble)
> - Manager/Worker applications, and easy load balancing when
> multiple incoming messages are expected. This blocking behavior makes
> the use of MPI_ANY_SOURCE dangerous for fault tolerant applications, and
> opens up another opportunity for deadlock.
> 
> For applications that want to use MPI_ANY_SOURCE and be fault tolerant
> they will need to build their own failure detector on top of MPI using
> directed point-to-point messages. A basic implementation might post
> MPI_Irecv()'s to each worker process with an unused tag, then poll on
> Testany(). If any of these requests complete in error
> (MPI_ERR_PROC_FAIL_STOP) then the target has failed and the application
> can take action. This user-level failure detector can (should) be
> implemented in a third-party library since failure detectors can be
> difficult to implement in a scalable manner.
> 
> In reality, the MPI library or the runtime system that supports MPI will
> already be doing something similar. Even for MPI_ERRORS_ARE_FATAL on
> MPI_COMM_WORLD, the underlying system must detect the process failure,
> and terminate all other processes in MPI_COMM_WORLD. So this represents
> a -detection- of the failure, and a -notification- of the failure
> throughout the system (though the notification is an order to
> terminate). For MPI_ERRORS_RETURN, the MPI will use this
> detection/notification functionality to reason about the state of the
> message traffic in the system. So it seems silly to force the user to
> duplicate this (nontrivial) detection/notification functionality on top
> of MPI, just to avoid the progress discussion.
> 
> 
> So that is a rough summary of the debate. If we are going to move
> forward, we need to make a decision on MPI_ANY_SOURCE. I would like to
> make such a decision before/during the next teleconf (Feb. 1).
> 
> I'm torn on this one, so I look forward to your comments.
> 
> -- Josh
> 
> -- 
> Joshua Hursey
> Postdoctoral Research Associate
> Oak Ridge National Laboratory
> http://users.nccs.gov/~jjhursey
> 

-- 
Howard Pritchard
Software Engineering
Cray, Inc.