[Mpi3-ft] Weekly con-calls
Greg Bronevetsky
bronevetsky1 at llnl.gov
Mon Sep 22 12:51:54 CDT 2008
>Lets resume these meetings on Wed next week, 9/24/2008, noon-2pm Eastern
>time moving this an hours later does not work for the people from Japan.
>What we want to talk about next time are specific user use-cases. We have
>talked so far in generalities, but now need to get to specifics in term of
>how apps actually want to use this functionality. As an example, take the
>case of client/server apps, with a failure in on of the ³clients². If the
>client is member of the remote group in the intercommunicator, fully
>defining error scenarios in the current MPI-2 dynamics should be sufficient
Lets start talking about this scenario. The first
thing to talk about is the functionality that may
be desired by the application. There are two
cases to consider. When the client is not
expecting a response from the server, its failure
is irrelevant to the server and the server
doesn't want to be informed of the failure.
However, if the client fails with pending
communication from the server, the server does
want to be informed of the failure, if only to
free the internal state associated with this
client. As such, the application may desire one
of two things. First, it may wish MPI to inform
it every time it tries to communicate with a
failed process. This way the server is notified
during its next send or receive operation to/from
the client and is thus able to perform the
appropriate cleanup operations. Furthermore, if
the server wants to be able to respond as quickly
as possible, it can ask MPI to tell it via a
callback when the client fails. We've already
discussed both types of interfaces in the group.
More specifically, the MPI should inform the
application either during the application's next
call that uses the above intercommunicator or
invoke the callback associated with this intercommunicator.
Looking at the problem at a lower level, we can
consider the various low-level problems that may
cause client "failure" and what MPI must do to
support the application's recovery efforts. The
first possibility is that the client node fails.
MPI can detect this event by using heart-beat
messages or by noticing that a message sent to
the client has timed out. In this case MPI should
use one of the above APIs to inform the other
processes that further communication to/from the
client will fail. Another possibility is that
some network link fails, disabling communication
between this client and this server but allowing
communication between other rank pairs. In this
case MPI may inform the application about the
pairs of ranks that can no longer communicate.
Another option would be for MPI to kill half the
processes that are affected by this problem,
ensuring that all the remaining processes can
communicate freely with each other. It would then
inform the application of these process deaths as
above. The final possibility is a network
partition where a subset of processes cannot
communicate with the remaining processes. MPI
would treat this case as one of the subsets
failing and kill all the processes in this
subset. It would then inform the processes in the other subset of these deaths.
The above is essentially a summary of what we've
talked about thus far, as applied to this
example. This is good, since this suggests that
we understand the problem well enough, at least
for this purpose. A good question then is, can
anybody think of any additional details that we need to think about?
Greg Bronevetsky
Post-Doctoral Researcher
1028 Building 451
Lawrence Livermore National Lab
(925) 424-5756
bronevetsky1 at llnl.gov
More information about the mpiwg-ft
mailing list