[Mpi3-ft] MPI_ANY_SOURCE ... again...

Josh Hursey jjhursey at open-mpi.org
Wed Jan 25 15:42:23 CST 2012


We really need to make a decision on semantics for MPI_ANY_SOURCE.

During the plenary session the MPI Forum had a problem with the current
proposed semantics. The current proposal states (roughly) that
MPI_ANY_SOURCE return when a failure emerges in the communicator. The MPI
Forum read this as a strong requirement for -progress- (something the MPI
standard tries to stay away from).

The alternative proposal is that a receive on MPI_ANY_SOURCE will block
until completed with a message. This means that it will -not- return when a
new failure has been encountered (even if the calling process is the only
process left alive in the communicator). This does get around the concern
about progress, but puts a large burden on the end user.


There are a couple good use cases for MPI_ANY_SOURCE (grumble, grumble) -
Manager/Worker applications, and easy load balancing when
multiple incoming messages are expected. This blocking behavior makes the
use of MPI_ANY_SOURCE dangerous for fault tolerant applications, and opens
up another opportunity for deadlock.

For applications that want to use MPI_ANY_SOURCE and be fault tolerant they
will need to build their own failure detector on top of MPI using directed
point-to-point messages. A basic implementation might post MPI_Irecv()'s to
each worker process with an unused tag, then poll on Testany(). If any of
these requests complete in error (MPI_ERR_PROC_FAIL_STOP) then the target
has failed and the application can take action. This user-level failure
detector can (should) be implemented in a third-party library since failure
detectors can be difficult to implement in a scalable manner.

In reality, the MPI library or the runtime system that supports MPI will
already be doing something similar. Even for MPI_ERRORS_ARE_FATAL on
MPI_COMM_WORLD, the underlying system must detect the process failure, and
terminate all other processes in MPI_COMM_WORLD. So this represents a
-detection- of the failure, and a -notification- of the failure throughout
the system (though the notification is an order to terminate). For
MPI_ERRORS_RETURN, the MPI will use this detection/notification
functionality to reason about the state of the message traffic in the
system. So it seems silly to force the user to duplicate this (nontrivial)
detection/notification functionality on top of MPI, just to avoid the
progress discussion.


So that is a rough summary of the debate. If we are going to move forward,
we need to make a decision on MPI_ANY_SOURCE. I would like to make such a
decision before/during the next teleconf (Feb. 1).

I'm torn on this one, so I look forward to your comments.

-- Josh

-- 
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20120125/fbcf0bac/attachment.html>


More information about the mpiwg-ft mailing list