[Mpi3-ft] Con Call on 1/4/2009

Josh Hursey jjhursey at open-mpi.org
Wed Jan 21 09:04:34 CST 2009


I am not going to be able to make it to today's call due to travel.

My primary concern is that the proposal relies a bit too heavily on  
some flavor of checkpointing or message logging in order to make the  
interface useful. There should be a set of guidelines that make the  
interface useful without a form of checkpointing or message logging on  
the system. Though I think the door should always be open to these  
types of additional functionality, but as far as the base  
specification I think it should be usable without them.

Best,
Josh

P.S. I should have a revised interface for the following proposal in  
the next week or so:
   https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/Quiescence

On Jan 20, 2009, at 6:54 PM, Greg Bronevetsky wrote:

> Here's my quick writeup of the major problems that we discussed with  
> writing modular apps on top of our proposed MPI fault tolerance spec  
> and an approach for making it relatively easy to write module- 
> specific error recovery algorithms without worrying about other  
> modules. I've attached a pdf version as well as a txt version that  
> will be easier to edit.
>
> Greg Bronevetsky
> Post-Doctoral Researcher
> 1028 Building 451
> Lawrence Livermore National Lab
> (925) 424-5756
> bronevetsky1 at llnl.gov
>
> At 06:58 PM 1/13/2009, Richard Graham wrote:
>> OK, we will resume the calls next week, 1/21/2009.
>>
>> Rich
>>
>>
>> On 1/13/09 11:42 AM, "Greg Bronevetsky" <bronevetsky1 at llnl.gov>  
>> wrote:
>>
>> >
>> >> Unfortunately, for reasons out of [my] control, I did not manage  
>> to
>> >> get the time to update the wiki and I doubt I will find any time
>> >> before the call tomorrow. I'll have time to get back to this  
>> starting
>> >> from tomorrow morning.
>> >>
>> >> I second your idea to cancel the call tomorrow.
>> >
>> > I have a protocol worked out to do micro-rollbacks that will work
>> > well if we add to the API some kind of asynchronous event
>> > notification mechanism like active messages. It will work not as  
>> well
>> > without the extension. I'll update George's document once its  
>> posted
>> > so that we have a unified document that describes the problem and  
>> the
>> > proposed solutions.
>> >
>> > Greg Bronevetsky
>> > Post-Doctoral Researcher
>> > 1028 Building 451
>> > Lawrence Livermore National Lab
>> > (925) 424-5756
>> > bronevetsky1 at llnl.gov
>> >
>> > _______________________________________________
>> > mpi3-ft mailing list
>> > mpi3-ft at lists.mpi-forum.org
>> > http:// lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>
>> _______________________________________________
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
>> http:// lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
> <Support for Developing Fault Tolerant Modular MPI  
> Applications.pdf><Support for Developing Fault Tolerant Modular MPI  
> Applications.txt>_______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft




More information about the mpiwg-ft mailing list