[Mpi3-ft] Defining the state of MPI after an error

Richard Treumann treumann at us.ibm.com
Wed Sep 22 14:29:31 CDT 2010


You lost me there - in part, i am saying it is useless because there are 
almost zero cases in which it would be appropriate.  How does that make it 
"a minor change"?

Can you provide me the precise text you would add to the standard? Exactly 
how does the CANNOT_CONTINUE work?  Under what conditions does an MPI 
process see a CANNOT_CONTINUE and what does it mean? 

Please look at the example again.  The point was that there is nothing 
there that would justify a CANNOT_CONTINUE and MPI is still working 
correctly. Despite that, the behavior is a mess from the algorithm 
viewpoint after the error.


Dick Treumann  -  MPI Team 
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846         Fax (845) 433-8363




From:
Darius Buntinas <buntinas at mcs.anl.gov>
To:
"MPI 3.0 Fault Tolerance and Dynamic Process Control working Group" 
<mpi3-ft at lists.mpi-forum.org>
Date:
09/22/2010 02:41 PM
Subject:
Re: [Mpi3-ft] Defining the state of MPI after an error
Sent by:
mpi3-ft-bounces at lists.mpi-forum.org




That's why I feel that this is a minor change.

As to your example, it's possible that failed collectives result in 
CANNOT_CONTINUE, but a send to a failed process maybe doesn't have to. You 
can still send to other processes.

-d

On Sep 22, 2010, at 1:16 PM, Richard Treumann wrote:

> The situation of MPI state being totally trashed by an error that 
returns a return code barely exists.  The case where it is subtly 
discombobulated is the norm. 


_______________________________________________
mpi3-ft mailing list
mpi3-ft at lists.mpi-forum.org
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20100922/ab883cdc/attachment-0001.html>


More information about the mpiwg-ft mailing list