[Mpi3-ft] Defining the state of MPI after an error
terry.dontje at oracle.com
Wed Sep 22 08:43:01 CDT 2010
Richard Treumann wrote:
> This proposal is not a minor change.
> Please do not make this hole in the standard and assume you can later
> add language to standardize everything that comes through the hole.
> If the standard is to introduce the notion of a recoverable error it
> must be as part of a full description of what "recovery" means.
> I think is is dangerous and ultimately useless to have implementors
> mark a failure as "recoverable" when the post error state of the
> distributed MPI has gone from "fully standards compliant" to "mostly
> standards compliant, read my user doc read my legal disclaimer, cross
> your fingers".
> See comment below for why I do not think the new hole is needed to
> allow people to do implementation specific recoverability.
> There is not even anything to prevent on implementation from deciding
> to add a function MPXX_WHAT_STILL_WORKS(err_code, answer) and
> documenting 5 or 5000 enumerated values for "answer" ranging from
> NOTHING through TAKE_A_CHANCE_IF_YOU_LIKE to EVERYTHING.
> IBM would probably return TAKE_A_CHANCE_IF_YOU_LIKE because I cannot
> imagine how we would promise exactly what will work and what will not
> but in practice most things will still work as expected.
I think I agree with Dick on the above. Another way of putting the
disagreement is that Josh's proposal is too general in that not all
errorcodes can be completely marked as MPI state is broken or not. When
Sun implemented fault tolerant client/server we came up with a new error
class that when returned gave the user the understanding that a
condition occurred on a communicator that has rendered the communicator
useless and one should clean it up before continuing on. The point is
there was a concrete understanding of the error and what could be done
to recover. As opposed to a general class that say's everything is
borked or not which essential doesn't give you much because you'll end
up eventually having to define a more specific class of error IMO.
> Dick Treumann - MPI Team
> IBM Systems & Technology Group
> Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
> Tele (845) 433-7846 Fax (845) 433-8363
> mpi3-ft-bounces at lists.mpi-forum.org wrote on 09/21/2010 04:54:08 PM:
> > [image removed]
> > Re: [Mpi3-ft] Defining the state of MPI after an error
> > Bronis R. de Supinski
> > to:
> > MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
> > 09/21/2010 04:59 PM
> > Sent by:
> > mpi3-ft-bounces at lists.mpi-forum.org
> > Please respond to "Bronis R. de Supinski", "MPI 3.0 Fault Tolerance
> > and Dynamic Process Control working Group"
> > Dick:
> > Re:
> > > The current MPI standard does not say the MPI implementation is
> > > broken once there is an error. Saying MPI state is undefined
> after an
> > > error simply says that the detailed semantic of the MPI standard
> can no
> > > longer be promised. In other words, after an error you leave
> behind the
> > > security of a portable standard semantic. You are operating at
> your own
> > > risk. You do not need to read more than that into it.
> > Perhaps my problem with this position is that I come from the
> > background of language definitions for compilers. When you
> > read "undefined" in the OpenMP specification then you are
> > being told that things are broken and the implementation does
> > need to do anything or even tell you what they actually do (and
> > I believe the same is true for the C and C++ standards). An
> > alternative is "implementation defined", which requires the
> > implementer to document what they actually do. Without that,
> > you cannot even rely on actions with a specific implementation
> > (unless you believe "My tests so far have not failed so I am OK").
> When a standard says behavior is "undefined" in some situation, it
> cannot mean behavior is "broken". It cannot mean the implementor is
> prohibited from making it still work. It cannot mean the implementor
> is prohibited from making certain things work and documenting them.
> Any statement like this in a standard would be definition of behavior
> and the behavior would no longer be "undefined".
> The only thing a standard can logically mean by "undefined" is that
> the STANDARD no longer mandates the definition.
> Bronis says:
> > I strongly feel "undefined" should be reserved for situations that
> > mean "your program is irrevocably broken and the implementer does
> > not need to worry about what happens to it after encountering them."
> I would say this as:
> I strongly feel "undefined" should be reserved for situations that
> mean "The standard no longer guarantees your program is not
> irrevocably broken. The implementer is not required by the standard to
> worry about what happens to it after encountering them. An
> Implementation is free to provide any "better" behavior that may be of
> value but users cannot assume another implementation provides similar
> behavior so cannot assume standards defined portability."
> I do not see how the use if the word "undefined" in a standard can be
> interpreted as a prohibition of any behavior an implementation might
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle * - Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.dontje at oracle.com <mailto:terry.dontje at oracle.com>
-------------- next part --------------
An HTML attachment was scrubbed...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 2059 bytes
Desc: not available
More information about the mpiwg-ft