[Mpi3-ft] Defining the state of MPI after an error
Richard Treumann
treumann at us.ibm.com
Mon Sep 20 11:17:36 CDT 2010
If there is any question about whether these calls are still valid after
an error with an error handler that returns (MPI_ERRORS_RETURN or user
handler)
MPI_Abort,
MPI_Error_string
MPI_Error_class
I assume it should be corrected as a trivial oversight in the original
text.
I would regard the real issue as being the difficulty with assuring the
state of remote processes.
There is huge difficulty in making any promise about how an interaction
between a process that has not taken an error and one that has will
behave.
For example, if there were a loop of 100 MPI_Bcast calls and on iteration
5, rank 3 uses a bad communicator, what is the proper state? Either a
sequence number is mandated so the other ranks hang quickly or a sequence
number is prohibited so everybody keeps going until the "end" when the
missing MPI_Bcast becomes critical. Of course, with no sequence number,
some tasks are stupidly using the iteration n-1 data for their iteration n
computation.
Dick Treumann - MPI Team
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846 Fax (845) 433-8363
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20100920/ee2f86bc/attachment-0001.html>
More information about the mpiwg-ft
mailing list