[Mpi3-ft] Transactional Messages
Terry.Dontje at Sun.COM
Mon Feb 25 07:30:38 CST 2008
Greg Bronevetsky wrote:
> At 09:38 AM 2/23/2008, Richard Graham wrote:
>> So I think we are some what talking past each other. I think that
>> what you really care about
>> with respect to communications errors is information on messages that
>> have not completed,
>> and, more important, can’t complete ? Is this correct ?
>> I have focused more on errors that occur, but the low-level can
>> handle them and does not need
>> to pass information about them back up to the user. I believe this is
>> where we said that we
>> may want to be able to give the app some indication on performance
>> degradation, at their
>> request. Is this correct ?
> Exactly. My point was that the former belongs in the transactional
> memory API, while the latter belongs in the QoS API. Kannan and I are
> tasked with drafting something for the latter. One thing to note
> though is that identical low-level events may fall under either API.
> In particular, a single-bitflip may result in a message drop in one
> MPI implementation and a seamless recovery with minor performance
> degradation in another. The MPI implementation gets to choose the
> mapping between low-level events and their high-level manifestations.
> Right now we are just trying to define a reasonable interface for the
> high-level manifestations.
So are you really talking about adding a whole new set of pt2pt and
collectives APIs specifically to handle transactional communications? Or
are you looking to extend the current APIs to have a mode that a user
could set the error detection? I have to admit I still find the
difference between the two modes that Rich describes not that far apart
from each other.
More information about the mpiwg-ft