<br><font size=2 face="sans-serif">A few quick observations:</font>

<br>

<br><font size=2 face="sans-serif">0) </font>

<br><font size=2 face="sans-serif">The constant is MPI_ERROR_ARE_FATAL,

not MPI_ERRORS_ABORT</font>

<br>

<br><font size=2 face="sans-serif">1) </font>

<br><font size=2 face="sans-serif">The MPI standard only mandates one return

code, MPI_SUCCESS. All other return codes are implementation specific and

non-portable.  For portability, MPI documents  error classes

and a query function that is passed an implementation defined return code

and returns the class.</font>

<br>

<br><font size=2 face="sans-serif">Assume I allow tags between 0 and 2**15.

As an MPI implementor, I am free to use return code 215 for a negative

tag and 399 for one that is above 2**15.  The error message I print

for 215 and the error message I print for 399 can be different. If the

user calls MPI_ERROR_CLASS() with either 215  or 399 I give back the

class MPI_ARR_TAG.  The user who checks the RC of a call to see if

it is == MPI_ERR_TAG has written non-portable code.</font>

<br>

<br><font size=2 face="sans-serif">If I decide  return codes 251 and

399 must be in class MPI_CANNOT_CONTINUE they can no longer be in class

MPI_ERR_TAG.</font>

<br>

<br>

<br><font size=2 face="sans-serif">2)</font>

<br><font size=2 face="sans-serif">The MPI standard avoids mandating specific

error checks.  It identifies a lot of errors and in many cases, says

what error class that error is in.  It does not say an implementation

MUST detect the error. I would not violate the standard by skipping the

check of whether MPI is initialized. My customers may demand it but the

standard does not.  You are introducing a mandate for one specific

sort of error.</font>

<br>

<br><font size=2 face="sans-serif">3)</font>

<br><font size=2 face="sans-serif">I am convinced that the intent of the

standard is to require MPI_ERROR_CLASS, MPI_ERROR_STRING and MPI_ABORT

to work after an ERRORS_RETURN. If this is insufficiently clear, it should

probably be addressed in a stand alone ticket.  (it is certainly possible

for an error (detected or not) to trash internal state and for that to

make one of these three unusable but that applies to every MPI call. The

standard does not say MPI_Send must work even if state was scrambled by

a wild store).  I do not know if anybody assumed MPI_INITIALIZED and

MPI_FINALIZED must work after an error. I see no harm in requiring it.</font>

<br>

<br><font size=2 face="sans-serif">5) </font>

<br><font size=2 face="sans-serif">Finally - I do not see that the ticket

does anything useful.  In particular, it does not provide any portability

improvements I can see.</font>

<br>

<br><font size=2 face="sans-serif">The MPI implementation could offer a

TIMID vs ADVENTUROUS switch (environment variable)</font>

<br>

<br><font size=2 face="sans-serif">TIMID - MPI query functions like MPI_COMM_SIZE

and MPI_ALLOC_MEM do not trigger CANNOT_CONTINUE but every other error

does.</font>

<br>

<br><font size=2 face="sans-serif">ADVENTUROUS - no error triggers CANNOT_CONTINUE.

</font>

<br>

<br><font size=2 face="sans-serif">The default would probably need to be

TIMID because if the default were ADVENTUROUS, it would open the implementor

to an accusation of failing to protect the customer. There can be no such

accusation now because the standard does not imply the implementation should

protect the customer.  </font>

<br>

<br><font size=2 face="sans-serif">I have no clue from the ticket what

would be a reasonable or portable middle ground.    I see the

proposal as harmful because any attempt to use it will produce an illusion

of portability when implementors try to find a middle ground without guidance

form the standard.</font>

<br>

<br><font size=2 face="sans-serif">Dick Treumann  -  MPI Team

          <br>

IBM Systems & Technology Group<br>

Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601<br>

Tele (845) 433-7846         Fax (845) 433-8363<br>

</font>

<br>

<br>

<br>

<table width=100%>

<tr valign=top>

<td><font size=1 color=#5f5f5f face="sans-serif">From:</font>

<td><font size=1 face="sans-serif">Joshua Hursey <jjhursey@open-mpi.org></font>

<tr valign=top>

<td><font size=1 color=#5f5f5f face="sans-serif">To:</font>

<td><font size=1 face="sans-serif">"MPI 3.0 Fault Tolerance and Dynamic

Process Control working Group" <mpi3-ft@lists.mpi-forum.org></font>

<tr valign=top>

<td><font size=1 color=#5f5f5f face="sans-serif">Date:</font>

<td><font size=1 face="sans-serif">09/23/2010 08:57 AM</font>

<tr valign=top>

<td><font size=1 color=#5f5f5f face="sans-serif">Subject:</font>

<td><font size=1 face="sans-serif">Re: [Mpi3-ft] Defining the state of

MPI after an error</font>

<tr valign=top>

<td><font size=1 color=#5f5f5f face="sans-serif">Sent by:</font>

<td><font size=1 face="sans-serif">mpi3-ft-bounces@lists.mpi-forum.org</font></table>

<br>

<hr noshade>

<br>

<br>

<br><tt><font size=2>(Bringing a lot of points together in a single response)<br>

<br>

The ticket that we are discussing is linked below (also part of the very

first email in this thread):<br>

  </font></tt><a href="https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/ft/err_cannot_continue"><tt><font size=2>https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/ft/err_cannot_continue</font></tt></a><tt><font size=2><br>

</font></tt>

<br><tt><font size=2>< snip ></font></tt>

<br>

<br><tt><font size=2>I deleted the discussion because only the ticket counts

now.<br>

</font></tt>

<br>

<br>