[Mpi3-ft] Transactional Messages
Greg Bronevetsky
bronevetsky1 at llnl.gov
Sat Feb 23 00:24:25 CST 2008
If dropped messages are exported by MPI to the
application, any recovery mechanism by the
application would have no need to place the lost
message back into the original sequence order.
The application will know that the message is
dropped and none of the APIs in the transactional
messages proposal allow the application to put a
message into some specific spot in the message
order. It is informed of the drop and it may do
whatever it wants. Importantly, if MPI notices
that a message is corrupted and tells the
application that it is dropped, it will also be
able to perform correct matching with that gap in
the sequence number. In fact, that requirement
must be included in the spec in order to make
sure that we don't have any ambiguities. If a
message is dropped, it should be counted as
delivered when the application is informed of the drop.
As for high-level recovery techniques, I can see
a number of uses for message drop information. It
may be that the message is some sort of periodic
notification from the master to the slaves and
dropped messages are irrelevant. It may also be
that the application is in fact using some sort
of protocol to overcome such failures. The point
is that such notifications can be useful and the
main question for us is the appropriate balance
between clean semantics and the performance hit.
The fully transactional option is going to be
expensive but very convenient. However, we have a
number of other options that will be quite useful
for use with higher-level recovery protocols and
quite cheap. I think that we should gravitate
towards those but I thought it important to
include a variety of options for people to toss around.
Greg Bronevetsky
Post-Doctoral Researcher
1028 Building 451
Lawrence Livermore National Lab
(925) 424-5756
bronevetsky1 at llnl.gov
>Actually, it is most likely that MPI
>implementations that dont try to deal with
> dropped messages, cant even detect that such event have occurred. For
> such implementation I would expect them to be able to detect a problem with
> failed communications only if the low-level library they use to implement
> the communications, such as some OS bypass library, returns an error when
> trying to post some sort of communications, or if the run-time used by MPI
> detects a fail process, and propagates this information to the rest of the
> processes in the application.
>
>The ONLY layer that can handle any sort of
>recovery from a live communications failure -
> i.e. w/o some sort of check-point restart with
> or with out message logging is the
> MPI implementation itself. The application
> reposting a send cant take get around the
> lost data, because of the MPI message ordering
> requirements, unless the implementation
> totally relies on another library to satisfy
> the MPI ordering requirements (i.e. it does not
> generate some sort of message sequence number)
> and the message lost is the last one
> that was sent. MPI is not allowed to attempt
> any matching if there is a gap in the
> sequence number.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20080223/2a2a8688/attachment-0001.html>
More information about the mpiwg-ft
mailing list