[Mpi3-ft] Defining the state of MPI after an error

Richard Treumann treumann at us.ibm.com
Mon Sep 20 11:17:36 CDT 2010


If there is any question about whether these calls are still valid after 
an error with an error handler that returns (MPI_ERRORS_RETURN or user 
handler) 

MPI_Abort,
MPI_Error_string
MPI_Error_class

I assume it should be corrected as a trivial oversight in the original 
text.

 I would regard the real issue as being the difficulty with assuring the 
state of remote processes. 

There is huge difficulty in making any promise about how an interaction 
between a process that has not taken an error and one that has will 
behave. 

For example, if there were a loop of 100 MPI_Bcast calls and on iteration 
5, rank 3 uses a bad communicator, what is the proper state?  Either a 
sequence number is mandated so the other ranks hang quickly or a sequence 
number is prohibited so everybody keeps going until the "end" when the 
missing MPI_Bcast becomes critical.  Of course, with no sequence number, 
some tasks are stupidly using the iteration n-1 data for their iteration n 
computation.

 




Dick Treumann  -  MPI Team 
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846         Fax (845) 433-8363
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20100920/ee2f86bc/attachment-0001.html>


More information about the mpiwg-ft mailing list