[Mpi3-ft] Defining the state of MPI after an error
treumann at us.ibm.com
Wed Sep 22 14:29:31 CDT 2010
You lost me there - in part, i am saying it is useless because there are
almost zero cases in which it would be appropriate. How does that make it
"a minor change"?
Can you provide me the precise text you would add to the standard? Exactly
how does the CANNOT_CONTINUE work? Under what conditions does an MPI
process see a CANNOT_CONTINUE and what does it mean?
Please look at the example again. The point was that there is nothing
there that would justify a CANNOT_CONTINUE and MPI is still working
correctly. Despite that, the behavior is a mess from the algorithm
viewpoint after the error.
Dick Treumann - MPI Team
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846 Fax (845) 433-8363
Darius Buntinas <buntinas at mcs.anl.gov>
"MPI 3.0 Fault Tolerance and Dynamic Process Control working Group"
<mpi3-ft at lists.mpi-forum.org>
09/22/2010 02:41 PM
Re: [Mpi3-ft] Defining the state of MPI after an error
mpi3-ft-bounces at lists.mpi-forum.org
That's why I feel that this is a minor change.
As to your example, it's possible that failed collectives result in
CANNOT_CONTINUE, but a send to a failed process maybe doesn't have to. You
can still send to other processes.
On Sep 22, 2010, at 1:16 PM, Richard Treumann wrote:
> The situation of MPI state being totally trashed by an error that
returns a return code barely exists. The case where it is subtly
discombobulated is the norm.
mpi3-ft mailing list
mpi3-ft at lists.mpi-forum.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the mpiwg-ft