[Mpi3-ft] Con Call on 1/4/2009
jjhursey at open-mpi.org
Wed Jan 21 09:04:34 CST 2009
I am not going to be able to make it to today's call due to travel.
My primary concern is that the proposal relies a bit too heavily on
some flavor of checkpointing or message logging in order to make the
interface useful. There should be a set of guidelines that make the
interface useful without a form of checkpointing or message logging on
the system. Though I think the door should always be open to these
types of additional functionality, but as far as the base
specification I think it should be usable without them.
P.S. I should have a revised interface for the following proposal in
the next week or so:
On Jan 20, 2009, at 6:54 PM, Greg Bronevetsky wrote:
> Here's my quick writeup of the major problems that we discussed with
> writing modular apps on top of our proposed MPI fault tolerance spec
> and an approach for making it relatively easy to write module-
> specific error recovery algorithms without worrying about other
> modules. I've attached a pdf version as well as a txt version that
> will be easier to edit.
> Greg Bronevetsky
> Post-Doctoral Researcher
> 1028 Building 451
> Lawrence Livermore National Lab
> (925) 424-5756
> bronevetsky1 at llnl.gov
> At 06:58 PM 1/13/2009, Richard Graham wrote:
>> OK, we will resume the calls next week, 1/21/2009.
>> On 1/13/09 11:42 AM, "Greg Bronevetsky" <bronevetsky1 at llnl.gov>
>> >> Unfortunately, for reasons out of [my] control, I did not manage
>> >> get the time to update the wiki and I doubt I will find any time
>> >> before the call tomorrow. I'll have time to get back to this
>> >> from tomorrow morning.
>> >> I second your idea to cancel the call tomorrow.
>> > I have a protocol worked out to do micro-rollbacks that will work
>> > well if we add to the API some kind of asynchronous event
>> > notification mechanism like active messages. It will work not as
>> > without the extension. I'll update George's document once its
>> > so that we have a unified document that describes the problem and
>> > proposed solutions.
>> > Greg Bronevetsky
>> > Post-Doctoral Researcher
>> > 1028 Building 451
>> > Lawrence Livermore National Lab
>> > (925) 424-5756
>> > bronevetsky1 at llnl.gov
>> > _______________________________________________
>> > mpi3-ft mailing list
>> > mpi3-ft at lists.mpi-forum.org
>> > http:// lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
>> http:// lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
> <Support for Developing Fault Tolerant Modular MPI
> Applications.pdf><Support for Developing Fault Tolerant Modular MPI
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
More information about the mpiwg-ft