[Mpi3-ft] Defining the state of MPI after an error
Richard Treumann
treumann at us.ibm.com
Mon Sep 20 09:33:22 CDT 2010
How does an application experience errors in classes (MPI_ERR_COUNT,
MPI_ERR_TAG) except by a bug in the application itself?
How can it be easier for someone to know how to continue from an arbitrary
application bug with confidence that the application is still giving good
answers, than to just fix the app?
Dick Treumann - MPI Team
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846 Fax (845) 433-8363
From:
Joshua Hursey <jjhursey at open-mpi.org>
To:
"MPI 3.0 Fault Tolerance and Dynamic Process Control working Group"
<mpi3-ft at lists.mpi-forum.org>
Date:
09/20/2010 10:05 AM
Subject:
[Mpi3-ft] Defining the state of MPI after an error
Sent by:
mpi3-ft-bounces at lists.mpi-forum.org
During EuroMPI and the MPI Forum meeting last week the issue of the MPI
state after an error was brought up a few times. The issue is that since
the state is undefined then no portable program can be written that uses
the errorhandlers then MPI functionality following the error. This issue
is particularly difficult for applications that wish to catch
informational or warning type errors (e.g., MPI_ERR_COUNT, MPI_ERR_TAG,
MPI_ERR_UNSUPPORTED_OPERATION). These operations are often recoverable by
the MPI implementation and/or the application.
To address this portability issue, I am bringing out the
MPI_ERR_CANNOT_CONTINUE error class from the stabilization proposal. I
presented the idea to the MPI Forum during a plenary session last week and
received a positive response on building a formal proposal [Straw vote: 22
(yes), 0 (no), 3 (abstain)].
I have created a first draft of the proposal for the working group to
review on the wiki at the link below:
https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/ft/err_cannot_continue
I would like to have this proposal ready by the Oct. meeting so we can
have a formal plenary session on it. If all goes well, maybe we can get a
first reading by Dec.
Let me know what you think about this proposal.
-- Josh
------------------------------------
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://www.cs.indiana.edu/~jjhursey
_______________________________________________
mpi3-ft mailing list
mpi3-ft at lists.mpi-forum.org
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20100920/491ee976/attachment-0001.html>
More information about the mpiwg-ft
mailing list