[Mpi3-ft] Transactional Messages

Mon Feb 25 07:30:38 CST 2008

Greg Bronevetsky wrote:
> At 09:38 AM 2/23/2008, Richard Graham wrote:
>> So I think we are some what talking past each other. I think that 
>> what you really care about
>> with respect to communications errors is information on messages that 
>> have not completed,
>> and, more important, can’t complete ? Is this correct ?
>> I have focused more on errors that occur, but the low-level can 
>> handle them and does not need
>> to pass information about them back up to the user. I believe this is 
>> where we said that we
>> may want to be able to give the app some indication on performance 
>> degradation, at their
>> request. Is this correct ?
>
> Exactly. My point was that the former belongs in the transactional 
> memory API, while the latter belongs in the QoS API. Kannan and I are 
> tasked with drafting something for the latter. One thing to note 
> though is that identical low-level events may fall under either API. 
> In particular, a single-bitflip may result in a message drop in one 
> MPI implementation and a seamless recovery with minor performance 
> degradation in another. The MPI implementation gets to choose the 
> mapping between low-level events and their high-level manifestations. 
> Right now we are just trying to define a reasonable interface for the 
> high-level manifestations.
So are you really talking about adding a whole new set of pt2pt and 
collectives APIs specifically to handle transactional communications? Or 
are you looking to extend the current APIs to have a mode that a user 
could set the error detection? I have to admit I still find the 
difference between the two modes that Rich describes not that far apart 
from each other.

--td