<br><font size=2 face="sans-serif">An incident that trashes MPI internal
data structures will seldom be recognizable as a trigger for a non_SUCCESS
rc on a particular call. If you do an MPI_Recv and the buffer pointer happens
to drop the data all over MPI internal state, there is about zero chance
the MPI_Recv call will be able to detect that. It will just return
MPI_SUCCESS and depending on what you trashed, things may run fine or may
break in unpredictable ways.</font>
<br>
<br><font size=2 face="sans-serif">I have absolutely no problem with a
community agreement to try some prototyping of ideas that will be proposed
for the standard if they prove out.</font>
<br>
<br>
<br><font size=2 face="sans-serif">Dick Treumann - MPI Team
<br>
IBM Systems & Technology Group<br>
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601<br>
Tele (845) 433-7846 Fax (845) 433-8363<br>
</font>
<br>
<br>
<br>
<table width=100%>
<tr valign=top>
<td><font size=1 color=#5f5f5f face="sans-serif">From:</font>
<td><font size=1 face="sans-serif">"Bronevetsky, Greg" <bronevetsky1@llnl.gov></font>
<tr valign=top>
<td><font size=1 color=#5f5f5f face="sans-serif">To:</font>
<td><font size=1 face="sans-serif">"MPI 3.0 Fault Tolerance and Dynamic
Process Control working Group" <mpi3-ft@lists.mpi-forum.org></font>
<tr valign=top>
<td><font size=1 color=#5f5f5f face="sans-serif">Date:</font>
<td><font size=1 face="sans-serif">09/22/2010 03:37 PM</font>
<tr valign=top>
<td><font size=1 color=#5f5f5f face="sans-serif">Subject:</font>
<td><font size=1 face="sans-serif">Re: [Mpi3-ft] Defining the state of
MPI after an error</font>
<tr valign=top>
<td><font size=1 color=#5f5f5f face="sans-serif">Sent by:</font>
<td><font size=1 face="sans-serif">mpi3-ft-bounces@lists.mpi-forum.org</font></table>
<br>
<hr noshade>
<br>
<br>
<br><font size=2 color=#424282 face="Calibri">One candidate for CANNOT_CONTINUE
would be a data corruption in MPI memory or some data structure inconsistency
due to a bug. This could have zero effect or could corrupt application
results or system state. It would be exceedingly difficult for MPI to do
anything meaningful here and continued operation is potentially very dangerous.
As such, I would consider this to be a bad enough error to return CANNOT_CONTINUE.
</font>
<br><font size=2 color=#424282 face="Calibri"> </font>
<br><font size=2 color=#424282 face="Calibri">I think the point of this
proposal is not that CANNOT_CONTINUE is going to be a common error but
to lay the groundwork for a more useful error reporting scheme. Today we’re
quite sure that CANNOT_CONTINUE will be the worst thing that an MPI implementation
will want to return but we’re not really sure about what the other errors
will look like. For example, RANK_DEAD and LINK_DEAD sound like plausible
error messages but we won’t know until individual implementations have
had a chance to experiment them. This proposal allows such experimentation
to happen within the same basic error reporting framework.</font>
<br><font size=2 color=#424282 face="Calibri"> </font>
<br><font size=2 color=#424282 face="Calibri">Having said that, I’m not
completely convinced that we need to include this in the spec yet or whether
this can be more like a community agreement until we understand the problem
better.</font>
<br><font size=2 color=#004080 face="Calibri"> </font>
<br><font size=2 color=#004080 face="Calibri">Greg Bronevetsky</font>
<br><font size=2 color=#004080 face="Calibri">Lawrence Livermore National
Lab</font>
<br><font size=2 color=#004080 face="Calibri">(925) 424-5756</font>
<br><font size=2 color=#004080 face="Calibri">bronevetsky@llnl.gov</font>
<br><a href=http://greg.bronevetsky.com/><font size=2 color=#004080 face="Calibri">http://greg.bronevetsky.com</font></a><font size=2 color=#004080 face="Calibri">
</font>
<br><font size=2 color=#004080 face="Calibri"> </font>
<br><font size=2 face="Tahoma"><b>From:</b> mpi3-ft-bounces@lists.mpi-forum.org
[</font><a href="mailto:mpi3-ft-bounces@lists.mpi-forum.org"><font size=2 face="Tahoma">mailto:mpi3-ft-bounces@lists.mpi-forum.org</font></a><font size=2 face="Tahoma">]
<b>On Behalf Of </b>Richard Treumann<b><br>
Sent:</b> Wednesday, September 22, 2010 11:17 AM<b><br>
To:</b> MPI 3.0 Fault Tolerance and Dynamic Process Control working Group<b><br>
Subject:</b> Re: [Mpi3-ft] Defining the state of MPI after an error</font>
<br><font size=3 face="Times New Roman"> </font>
<br><font size=2 face="Arial"><br>
Darius </font><font size=3 face="Times New Roman"><br>
</font><font size=2 face="Arial"><br>
I can imagine a few errors that I know will be harmless to MPI state. I
can make sure nobody can do any harm by passing an invalid communicator
to MPI_COMM_SIZE.</font><font size=3 face="Times New Roman"> <br>
</font><font size=2 face="Arial"><br>
I cannot think of a detectable error that would return and leave that thread
of that process so totally broken that nothing in MPI will work from then
on. In a collective, there may be processes in which the thread that called
the CC never returns and that tread of the process is no longer usable
because it is hung. Other threads using other communicators in the
process with a hung thread may work perfectly.</font><font size=3 face="Times New Roman">
<br>
</font><font size=2 face="Arial"><br>
Except for the very few cases where I know there was no damage (like a
bad comm on MPI_COMM_SIZE) the situation, 99.99% of the time, will be that
everything still works but sometimes the outcome is a surprise to the user.
Say you do:</font><font size=3 face="Times New Roman"> <br>
</font><font size=2 face="Arial"><br>
1 MPI_Barrier (on world)</font><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
2 MPI_Barrier (on world):</font><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
3 other stuff <br>
4 MPI_Barrier (on world)</font><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
5 if (my rank is even)</font><font size=3 face="Times New Roman"> </font><font size=2 face="Arial"><br>
6 sendrecv(with odd neighbor)</font><font size=3 face="Times New Roman">
</font><font size=2 face="Arial"><br>
7 else <br>
8 sendrecv(with even neighbor)</font><font size=3 face="Times New Roman">
<br>
<br>
</font><font size=2 face="Arial"><br>
but get back an error at all even numbered ranks from the line 1
barrier call. The line 2 MPI_Barrier may still "work" but the
line 2 barrier at even numbered ranks will match the line 1 barrier at
odd ranks. Even ranks will begin "other stuff" and odd ranks
will sit in the line 2 barrier until even ranks finish "other
stuff" and reach the line 4 barrier. The odd ranks now get through
their line 2 barrier and begin other stuff.</font><font size=3 face="Times New Roman">
<br>
</font><font size=2 face="Arial"><br>
If "other stuff" involves communication among the even
ranks and communication among the odd ranks. that will work too. The even
ranks will all send/recv among themselves later the odd ranks will all
send/recv among themselves.</font><font size=3 face="Times New Roman">
<br>
</font><font size=2 face="Arial"><br>
The even ranks will reach line 6 and hang there because the odd tasks are
still stuck at line 4. </font><font size=3 face="Times New Roman">
<br>
</font><font size=2 face="Arial"><br>
In this entire example, libmpi has continued working "correctly"
but the behavior you get from correct behavior is not what you planned.
</font><font size=3 face="Times New Roman"><br>
</font><font size=2 face="Arial"><br>
The situation of MPI state being totally trashed by an error that returns
a return code barely exists. The case where it is subtly discombobulated
is the norm.</font><font size=3 face="Times New Roman"> <br>
<br>
<br>
<br>
<br>
</font><font size=2 face="Arial"><br>
Dick Treumann - MPI Team
<br>
IBM Systems & Technology Group<br>
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601<br>
Tele (845) 433-7846 Fax (845) 433-8363</font><font size=3 face="Times New Roman"><br>
<br>
</font>
<p>
<table width=100%>
<tr valign=top>
<td width=8%><font size=1 color=#5f5f5f face="Arial">From:</font><font size=3 face="Times New Roman">
</font>
<td width=91%><font size=1 face="Arial">Darius Buntinas <buntinas@mcs.anl.gov></font><font size=3 face="Times New Roman">
</font>
<tr valign=top>
<td><font size=1 color=#5f5f5f face="Arial">To:</font><font size=3 face="Times New Roman">
</font>
<td><font size=1 face="Arial">"MPI 3.0 Fault Tolerance and Dynamic
Process Control working Group" <mpi3-ft@lists.mpi-forum.org></font><font size=3 face="Times New Roman">
</font>
<tr valign=top>
<td><font size=1 color=#5f5f5f face="Arial">Date:</font><font size=3 face="Times New Roman">
</font>
<td><font size=1 face="Arial">09/22/2010 12:24 PM</font><font size=3 face="Times New Roman">
</font>
<tr valign=top>
<td><font size=1 color=#5f5f5f face="Arial">Subject:</font><font size=3 face="Times New Roman">
</font>
<td><font size=1 face="Arial">Re: [Mpi3-ft] Defining the state of MPI after
an error</font><font size=3 face="Times New Roman"> </font>
<tr valign=top>
<td><font size=1 color=#5f5f5f face="Arial">Sent by:</font><font size=3 face="Times New Roman">
</font>
<td><font size=1 face="Arial">mpi3-ft-bounces@lists.mpi-forum.org</font></table>
<br><font size=3 face="Times New Roman"> </font>
<div align=center>
<br>
<hr noshade></div>
<br><font size=3 face="Times New Roman"><br>
<br>
</font><font size=2 face="Courier New"><br>
<br>
OK, I (think I) see what you guys are saying, so maybe we should look at
it this way. The CANNOT_CONTINUE proposal should not define the operation
of the MPI implementation after errors other than CANNOT_CONTINUE. Instead,
it defines that after the implementation gives a CANNOT_CONTINUE error,
the app knows that the implementation is fatally wedged, and that the user
should definately not expect correct operation after this. I.e.,
we're not labeling other errors as "recoverable," we're just
marking CANNOT_CONTINUE as "unrecoverable."<br>
<br>
Note that an implementation can still be standard compliant even if it
never returns a CANNOT_CONTINUE error even when it is fatally wedged (because
operation after any other error is still undefined).<br>
<br>
This just defines a way for the implementation to let the user know that
it has given up. So that if the implementation provides best-effort
functionality after an error, and the user has "read the disclaimer"
and is comfortable with proceeding, this is a way to differentiate between
an error as a result of a failure that hosed everything, and one that may
allow things to continue.<br>
<br>
We still would like to define what happens to a bcast after a process in
the communicator fails. But we leave that for future proposals.<br>
<br>
Does this make sense?<br>
-d<br>
<br>
On Sep 22, 2010, at 8:43 AM, Terry Dontje wrote:<br>
<br>
> Richard Treumann wrote:<br>
>> <br>
>> This proposal is not a minor change. <br>
>> <br>
>> Please do not make this hole in the standard and assume you can
later add language to standardize everything that comes through the hole.
<br>
>> <br>
>> If the standard is to introduce the notion of a recoverable error
it must be as part of a full description of what "recovery" means.
<br>
>> <br>
>> I think is is dangerous and ultimately useless to have implementors
mark a failure as "recoverable" when the post error state of
the distributed MPI has gone from "fully standards compliant"
to "mostly standards compliant, read my user doc read my legal disclaimer,
cross your fingers". <br>
>> <br>
>> See comment below for why I do not think the new hole is needed
to allow people to do implementation specific recoverability. <br>
>> <br>
>> There is not even anything to prevent on implementation from deciding
to add a function MPXX_WHAT_STILL_WORKS(err_code, answer) and documenting
5 or 5000 enumerated values for "answer" ranging from NOTHING
through TAKE_A_CHANCE_IF_YOU_LIKE to EVERYTHING. <br>
>> <br>
>> IBM would probably return TAKE_A_CHANCE_IF_YOU_LIKE because I
cannot imagine how we would promise exactly what will work and what will
not but in practice most things will still work as expected. <br>
>> <br>
> I think I agree with Dick on the above. Another way of putting
the disagreement is that Josh's proposal is too general in that not all
errorcodes can be completely marked as MPI state is broken or not. When
Sun implemented fault tolerant client/server we came up with a new error
class that when returned gave the user the understanding that a condition
occurred on a communicator that has rendered the communicator useless and
one should clean it up before continuing on. The point is there was
a concrete understanding of the error and what could be done to recover.
As opposed to a general class that say's everything is borked or
not which essential doesn't give you much because you'll end up eventually
having to define a more specific class of error IMO.<br>
> <br>
> --td<br>
>> <br>
>> <br>
>> <br>
>> Dick Treumann - MPI Team
<br>
>> IBM Systems & Technology Group<br>
>> Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601<br>
>> Tele (845) 433-7846 Fax (845) 433-8363<br>
>> <br>
>> <br>
>> mpi3-ft-bounces@lists.mpi-forum.org wrote on 09/21/2010 04:54:08
PM:<br>
>> <br>
>> > [image removed] <br>
>> > <br>
>> > Re: [Mpi3-ft] Defining the state of MPI after an error <br>
>> > <br>
>> > Bronis R. de Supinski <br>
>> > <br>
>> > to: <br>
>> > <br>
>> > MPI 3.0 Fault Tolerance and Dynamic Process Control working
Group <br>
>> > <br>
>> > 09/21/2010 04:59 PM <br>
>> > <br>
>> > Sent by: <br>
>> > <br>
>> > mpi3-ft-bounces@lists.mpi-forum.org <br>
>> > <br>
>> > Please respond to "Bronis R. de Supinski", "MPI
3.0 Fault Tolerance <br>
>> > and Dynamic Process Control working Group" <br>
>> > <br>
>> > <br>
>> > Dick:<br>
>> > <br>
>> > Re:<br>
>> > > The current MPI standard does not say the MPI implementation
is totally <br>
>> > > broken once there is an error. Saying MPI state
is undefined after an <br>
>> > > error simply says that the detailed semantic of the
MPI standard can no <br>
>> > > longer be promised. In other words, after an error you
leave behind the <br>
>> > > security of a portable standard semantic. You
are operating at your own <br>
>> > > risk. You do not need to read more than that into it.<br>
>> > <br>
>> > Perhaps my problem with this position is that I come from
the<br>
>> > background of language definitions for compilers. When you<br>
>> > read "undefined" in the OpenMP specification then
you are<br>
>> > being told that things are broken and the implementation
does<br>
>> > need to do anything or even tell you what they actually do
(and<br>
>> > I believe the same is true for the C and C++ standards).
An<br>
>> > alternative is "implementation defined", which
requires the<br>
>> > implementer to document what they actually do. Without that,<br>
>> > you cannot even rely on actions with a specific implementation<br>
>> > (unless you believe "My tests so far have not failed
so I am OK").<br>
>> <br>
>> <br>
>> When a standard says behavior is "undefined" in some
situation, it cannot mean behavior is "broken". It cannot mean
the implementor is prohibited from making it still work. It cannot mean
the implementor is prohibited from making certain things work and documenting
them. Any statement like this in a standard would be definition of behavior
and the behavior would no longer be "undefined". <br>
>> <br>
>> The only thing a standard can logically mean by "undefined"
is that the STANDARD no longer mandates the definition. <br>
>> <br>
>> Bronis says: <br>
>> <br>
>> > <br>
>> > I strongly feel "undefined" should be reserved
for situations that<br>
>> > mean "your program is irrevocably broken and the implementer
does<br>
>> > not need to worry about what happens to it after encountering
them." <br>
>> <br>
>> I would say this as: <br>
>> <br>
>> I strongly feel "undefined" should be reserved for situations
that mean "The standard no longer guarantees your program is not irrevocably
broken. The implementer is not required by the standard to worry about
what happens to it after encountering them. An Implementation is free to
provide any "better" behavior that may be of value but users
cannot assume another implementation provides similar behavior so cannot
assume standards defined portability." <br>
>> <br>
>> I do not see how the use if the word "undefined" in
a standard can be interpreted as a prohibition of any behavior an implementation
might offer. <br>
>> <br>
>> <br>
>> <br>
>> <br>
>> <br>
>> <br>
>> _______________________________________________<br>
>> mpi3-ft mailing list<br>
>> <br>
>> mpi3-ft@lists.mpi-forum.org<br>
>> </font><a href="http://blockedlists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft"><font size=2 color=blue face="Courier New"><u>http://BLOCKEDlists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft</u></font></a><font size=2 face="Courier New"><br>
>> <br>
>> <br>
>> <br>
> <br>
> <br>
> -- <br>
> <Mail Attachment.gif><br>
> Terry D. Dontje | Principal Software Engineer<br>
> Developer Tools Engineering | +1.781.442.2631<br>
> Oracle - Performance Technologies<br>
> 95 Network Drive, Burlington, MA 01803<br>
> Email terry.dontje@oracle.com<br>
> <br>
> _______________________________________________<br>
> mpi3-ft mailing list<br>
> mpi3-ft@lists.mpi-forum.org<br>
> </font><a href="http://blockedlists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft"><font size=2 color=blue face="Courier New"><u>http://BLOCKEDlists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft</u></font></a><font size=2 face="Courier New"><br>
<br>
<br>
_______________________________________________<br>
mpi3-ft mailing list<br>
mpi3-ft@lists.mpi-forum.org</font><font size=3 color=blue face="Times New Roman"><u><br>
</u></font><a href="http://blockedlists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft"><font size=2 color=blue face="Courier New"><u>http://BLOCKEDlists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft</u></font></a><font size=2 face="Courier New"><br>
</font><tt><font size=2>_______________________________________________<br>
mpi3-ft mailing list<br>
mpi3-ft@lists.mpi-forum.org<br>
</font></tt><a href="http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft"><tt><font size=2>http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft</font></tt></a><tt><font size=2><br>
</font></tt>
<br>
<br>