[Mpi3-ft] error vs fault

Josh Hursey jjhursey at open-mpi.org
Fri Jun 24 14:08:17 CDT 2011


It sounds ok to try out. A clear distinction between what the MPI
standard means when it refers to faults, errors, and erroneous
programs might be a useful clarification (since even with the RTS
proposal an application can still be erroneous if they do not heed the
semantic requirements of MPI).

I'm a little concerned about adding something that significant to the
RTS proposal this close to the deadline. In particular, I think we
might need to discuss the changes that you propose before committing
to them.

So I would suggest that you fork the ft-wg RTS proposal in the SVN
repository and make the changes there (I think you have permissions to
do that, if not let me know and we'll sort it out). Then once it is
ready we can review the changes and decide if we want to merge them in
now, later, or push to the future. Merging them back in shouldn't be
too difficult if we like what we see, but pulling them out if we
cannot agree before the deadline might be more difficult.

The absolute final day for the RTS proposal is July 4. Since that is a
US holiday, I intend on sending out the RTS proposal next Thursday
(June 30). So my hope is that Wed. June 29 on the teleconf we are
making the final edits before it goes out the door.

-- Josh


On Thu, Jun 23, 2011 at 7:40 PM, Bronevetsky, Greg
<bronevetsky1 at llnl.gov> wrote:
> I don't know if we need to. It sounds like this is too much detail for the type of change that Darius is proposing. It sounds good to me.
>
> Greg Bronevetsky
> Lawrence Livermore National Lab
> (925) 424-5756
> bronevetsky at llnl.gov
> http://greg.bronevetsky.com
>
>
>> -----Original Message-----
>> From: mpi3-ft-bounces at lists.mpi-forum.org [mailto:mpi3-ft-
>> bounces at lists.mpi-forum.org] On Behalf Of Graham, Richard L.
>> Sent: Thursday, June 23, 2011 4:20 PM
>> To: 'MPI 3.0 Fault Tolerance and Dynamic Process Control working Group';
>> 'MPI 3.0 Fault Tolerance and Dynamic Process Control working Group'
>> Subject: Re: [Mpi3-ft] error vs fault
>>
>> How will you distinguish between an error that is the result of a bad
>> parameter and a "detected fault" ?
>>
>> Rich
>>
>>
>>
>> Sent with Good (www.good.com)
>>
>>
>>  -----Original Message-----
>> From:         Darius Buntinas [mailto:buntinas at mcs.anl.gov]
>> Sent: Thursday, June 23, 2011 05:30 PM Eastern Standard Time
>> To:   MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
>> Subject:      [Mpi3-ft] error vs fault
>>
>>
>> I feel it would be easier to explain/understand FT related things if we
>> distinguished between errors and faults, i.e., an error is a detected fault.  I'd
>> like to take a stab at running through the whole standard to make these
>> changes.  But before I spend time on this I'd like to make sure people are OK
>> with it.
>>
>> I believe I'd have to be done with this by Wednesday.  (right Josh?)
>>
>> Does anyone have objections?
>>
>> -d
>> _______________________________________________
>> mpi3-ft mailing list
>> mpi3-ft at lists . mpi-forum . org
>> hxxp://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>
>>
>> _______________________________________________
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>
>



-- 
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey




More information about the mpiwg-ft mailing list