[Mpi3-ft] Transactional Messages

Sat Feb 23 00:24:25 CST 2008

If dropped messages are exported by MPI to the 
application, any recovery mechanism by the 
application would have no need to place the lost 
message back into the original sequence order. 
The application will know that the message is 
dropped and none of the APIs in the transactional 
messages proposal allow the application to put a 
message into some specific spot in the message 
order. It is informed of the drop and it may do 
whatever it wants. Importantly, if MPI notices 
that a message is corrupted and tells the 
application that it is dropped, it will also be 
able to perform correct matching with that gap in 
the sequence number. In fact, that requirement 
must be included in the spec in order to make 
sure that we don't have any ambiguities. If a 
message is dropped, it should be counted as 
delivered when the application is informed of the drop.

As for high-level recovery techniques, I can see 
a number of uses for message drop information. It 
may be that the message is some sort of periodic 
notification from the master to the slaves and 
dropped messages are irrelevant. It may also be 
that the application is in fact using some sort 
of protocol to overcome such failures. The point 
is that such notifications can be useful and the 
main question for us is the appropriate balance 
between clean semantics and the performance hit. 
The fully transactional option is going to be 
expensive but very convenient. However, we have a 
number of other options that will be quite useful 
for use with higher-level recovery protocols and 
quite cheap. I think that we should gravitate 
towards those but I thought it important to 
include a variety of options for people to toss around.

Greg Bronevetsky
Post-Doctoral Researcher
1028 Building 451
Lawrence Livermore National Lab
(925) 424-5756
bronevetsky1 at llnl.gov

>Actually, it is most likely that MPI 
>implementations that don’t try to deal with
>  dropped messages, can’t even detect that such event have occurred.  For
>  such implementation I would expect them to be able to detect a problem with
>  failed communications only if the low-level library they use to implement
>  the communications, such as some OS bypass library, returns an error when
>  trying to post some sort of communications, or if the run-time used by MPI
>  detects a fail process, and propagates this information to the rest of the
>  processes in the application.
>
>The ONLY layer that can handle any sort of 
>recovery from a live communications failure -
>  i.e. w/o some sort of check-point restart with 
> or with out message logging – is the
>  MPI implementation itself.  The application 
> reposting a send can’t take get around the
>  lost data, because of the MPI message ordering 
> requirements, unless the implementation
>  totally relies on another library to satisfy 
> the MPI ordering requirements (i.e. it does not
>  generate some sort of message sequence number) 
> and the message lost is the last one
>  that was sent.  MPI is not allowed to attempt 
> any matching if there is a gap in the
>  sequence number.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20080223/2a2a8688/attachment-0001.html>