<br><font size=2 face="sans-serif">I am not talking about libmpi fixing
an application bug. I am talking about the fact that if an application
has a bug, the state of the application becomes unknown. Something
that was part of the algorithm that the author was trying to apply to get
an answer has not happened as envisioned. How can the application
state be trusted? </font>
<br>
<br><font size=2 face="sans-serif">I see no problem with urging MPI implementations
to refrain from shooting down future MPI calls when the user has set MPI_ERRORS_RETURN
but I have a hard time imagining going much beyond that for application
bugs.</font>
<br>
<br><font size=2 face="sans-serif">For example, a call to MPI_Bcast that
has a bad communicator at one task will eventually hang but one that has
a bad communicator at all tasks can continue (the application state is
probably corrupted but libmpi should be OK)</font>
<br>
<br>
<br>
<br>
<br><font size=2 face="sans-serif">Dick Treumann - MPI Team
<br>
IBM Systems & Technology Group<br>
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601<br>
Tele (845) 433-7846 Fax (845) 433-8363<br>
</font>
<br>
<br>
<br>
<table width=100%>
<tr valign=top>
<td><font size=1 color=#5f5f5f face="sans-serif">From:</font>
<td><font size=1 face="sans-serif">Darius Buntinas <buntinas@mcs.anl.gov></font>
<tr valign=top>
<td><font size=1 color=#5f5f5f face="sans-serif">To:</font>
<td><font size=1 face="sans-serif">"MPI 3.0 Fault Tolerance and Dynamic
Process Control working Group" <mpi3-ft@lists.mpi-forum.org></font>
<tr valign=top>
<td><font size=1 color=#5f5f5f face="sans-serif">Date:</font>
<td><font size=1 face="sans-serif">09/20/2010 10:43 AM</font>
<tr valign=top>
<td><font size=1 color=#5f5f5f face="sans-serif">Subject:</font>
<td><font size=1 face="sans-serif">Re: [Mpi3-ft] Defining the state of
MPI after an error</font>
<tr valign=top>
<td><font size=1 color=#5f5f5f face="sans-serif">Sent by:</font>
<td><font size=1 face="sans-serif">mpi3-ft-bounces@lists.mpi-forum.org</font></table>
<br>
<hr noshade>
<br>
<br>
<br><tt><font size=2><br>
I don't think Josh meant that the MPI implementation would fix application
bugs, but rather that the return of an error class other than CANNOT_CONTINUE
means that the implementation is in an internally consistent state and
that it can continue performing MPI functions.<br>
<br>
-d<br>
<br>
On Sep 20, 2010, at 9:33 AM, Richard Treumann wrote:<br>
<br>
> <br>
> How does an application experience errors in classes (MPI_ERR_COUNT,
MPI_ERR_TAG) except by a bug in the application itself? <br>
> <br>
> How can it be easier for someone to know how to continue from an arbitrary
application bug with confidence that the application is still giving good
answers, than to just fix the app? <br>
> <br>
> <br>
> Dick Treumann - MPI Team
<br>
> IBM Systems & Technology Group<br>
> Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601<br>
> Tele (845) 433-7846 Fax (845) 433-8363<br>
> <br>
> <br>
> <br>
> From:
Joshua Hursey <jjhursey@open-mpi.org><br>
> To:
"MPI 3.0 Fault Tolerance and Dynamic Process Control working Group"
<mpi3-ft@lists.mpi-forum.org><br>
> Date:
09/20/2010 10:05 AM<br>
> Subject:
[Mpi3-ft] Defining the state of MPI after an error<br>
> Sent by:
mpi3-ft-bounces@lists.mpi-forum.org<br>
> <br>
> <br>
> <br>
> <br>
> During EuroMPI and the MPI Forum meeting last week the issue of the
MPI state after an error was brought up a few times. The issue is that
since the state is undefined then no portable program can be written that
uses the errorhandlers then MPI functionality following the error. This
issue is particularly difficult for applications that wish to catch informational
or warning type errors (e.g., MPI_ERR_COUNT, MPI_ERR_TAG, MPI_ERR_UNSUPPORTED_OPERATION).
These operations are often recoverable by the MPI implementation and/or
the application.<br>
> <br>
> To address this portability issue, I am bringing out the MPI_ERR_CANNOT_CONTINUE
error class from the stabilization proposal. I presented the idea to the
MPI Forum during a plenary session last week and received a positive response
on building a formal proposal [Straw vote: 22 (yes), 0 (no), 3 (abstain)].<br>
> <br>
> I have created a first draft of the proposal for the working group
to review on the wiki at the link below:<br>
> </font></tt><a href="https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/ft/err_cannot_continue"><tt><font size=2>https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/ft/err_cannot_continue</font></tt></a><tt><font size=2><br>
> <br>
> I would like to have this proposal ready by the Oct. meeting so we
can have a formal plenary session on it. If all goes well, maybe we can
get a first reading by Dec.<br>
> <br>
> Let me know what you think about this proposal.<br>
> <br>
> -- Josh<br>
> <br>
> ------------------------------------<br>
> Joshua Hursey<br>
> Postdoctoral Research Associate<br>
> Oak Ridge National Laboratory<br>
> </font></tt><a href=http://www.cs.indiana.edu/~jjhursey><tt><font size=2>http://www.cs.indiana.edu/~jjhursey</font></tt></a><tt><font size=2><br>
> <br>
> <br>
> _______________________________________________<br>
> mpi3-ft mailing list<br>
> mpi3-ft@lists.mpi-forum.org<br>
> </font></tt><a href="http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft"><tt><font size=2>http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft</font></tt></a><tt><font size=2><br>
> <br>
> <br>
> _______________________________________________<br>
> mpi3-ft mailing list<br>
> mpi3-ft@lists.mpi-forum.org<br>
> </font></tt><a href="http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft"><tt><font size=2>http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft</font></tt></a><tt><font size=2><br>
<br>
<br>
_______________________________________________<br>
mpi3-ft mailing list<br>
mpi3-ft@lists.mpi-forum.org<br>
</font></tt><a href="http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft"><tt><font size=2>http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft</font></tt></a><tt><font size=2><br>
</font></tt>
<br>
<br>