Index: chap-pt2pt/pt2pt.tex =================================================================== --- chap-pt2pt/pt2pt.tex (revision 853) +++ chap-pt2pt/pt2pt.tex (working copy) @@ -447,7 +447,7 @@ the storage containing \mpiarg{count} consecutive elements of the type specified by \mpiarg{datatype}, starting at address \mpiarg{buf}. The length of the received message must be less than or equal to the length of -the receive buffer. An overflow error occurs if all incoming data does +the receive buffer. An overflow error \MPIreplace{3.0}{999}{occurs}{is raised} if all incoming data does not fit, without truncation, into the receive buffer. If a @@ -1027,7 +1027,7 @@ that can be represented in one system cannot be represented in the other system. An exception occurring during representation conversion results in a failure of the -communication. An error occurs either in the send operation, or the receive +communication. An error \MPIreplace{3.0}{999}{occurs}{is raised} either in the send operation, or the receive operation, or both. If a value sent in a message is untyped (i.e., of type \type{MPI\_BYTE}), @@ -1181,7 +1181,7 @@ completion does not depend on the occurrence of a matching receive. Thus, if a send is executed and no matching receive is posted, then \MPI/ must buffer the outgoing message, so as to allow the send call to complete. An error will -occur if there is insufficient buffer space. The amount of available buffer +\MPIreplace{3.0}{999}{occur}{be raised} if there is insufficient buffer space. The amount of available buffer space is controlled by the user --- see Section~\ref{sec:pt2pt-buffer}. Buffer allocation by the user may be required for the buffered mode to be effective. @@ -1476,7 +1476,7 @@ Any pending communication operation consumes system resources that are limited. -Errors may occur when lack of resources prevent the execution of an \MPI/ call. +\MPIreplace{3.0}{999}{Errors}{Faults} may occur when lack of resources prevent the execution of an \MPI/ call. A quality implementation will use a (small) fixed amount of resources for each pending send in the ready or synchronous mode and for each pending receive. However, buffer @@ -1497,7 +1497,7 @@ in Section~\ref{sec:pt2pt-buffer}. A buffered send operation that cannot complete because of a lack of buffer space -is erroneous. When such a situation is detected, an error is signalled that may +is erroneous. When such a situation is detected, an error is \MPIreplace{3.0}{999}{signalled}{raised} that may cause the program to terminate abnormally. On the other hand, a standard send operation that cannot complete because @@ -1741,10 +1741,10 @@ We outline below a model implementation that defines this policy. \MPI/ may provide more buffering, and may use a better buffer allocation algorithm -than described below. On the other hand, \MPI/ may signal an error whenever the +than described below. On the other hand, \MPI/ may \MPIreplace{3.0}{999}{signal}{raise} an error whenever the simple buffering allocator described below would run out of space. In particular, if no buffer is explicitly associated with the process, then any -buffered send may cause an error. +buffered send may \MPIreplace{3.0}{999}{cause}{raise} an error. \MPI/ does not provide mechanisms for querying or controlling buffering done by @@ -2308,9 +2308,9 @@ \begin{users} Once a request is freed by a call to \mpifunc{MPI\_REQUEST\_FREE}, it is not possible to check for the successful completion of the associated communication -with calls to \mpifunc{MPI\_WAIT} or \mpifunc{MPI\_TEST}. Also, if an error occurs subsequently +with calls to \mpifunc{MPI\_WAIT} or \mpifunc{MPI\_TEST}. Also, if \MPIreplace{3.0}{999}{an error}{a fault} occurs subsequently during the communication, an error code cannot be returned to the user --- -such an error must be treated as fatal +such \MPIreplace{3.0}{999}{an error}{a fault} must be treated as fatal \MPIupdate{3.0}{276}{to the local process in accordance with the associated error handler on the communicator. If the error handler associated with the communicator cannot be determined then this has the same effect as calling \mpifunc{MPI\_ABORT} on \const{MPI\_COMM\_SELF}.} An active receive request should never be freed as the receiver will have no way to verify that the receive @@ -2639,8 +2639,8 @@ \begin{rationale} This design streamlines error handling in the application. The application code need only test the (single) function result to -determine if an error has occurred. It needs to check each individual -status only when an error occurred. +determine if an error has \MPIreplace{3.0}{999}{occurred}{been raised}. It needs to check each individual +status only when an error \MPIreplace{3.0}{999}{occurred}{has been raised}. \end{rationale} \begin{funcdef}{MPI\_TESTALL(count, array\_of\_requests, flag, @@ -2679,7 +2679,7 @@ and the values of the status entries are undefined. This is a local operation. -Errors that occurred during the execution of \mpifunc{MPI\_TESTALL} +Errors that \MPIreplace{3.0}{999}{occurred}{were raised} during the execution of \mpifunc{MPI\_TESTALL} are handled as errors in \mpifunc{MPI\_WAITALL}. \begin{funcdef}{MPI\_WAITSOME(incount, array\_of\_requests, outcount, @@ -2729,8 +2729,8 @@ succeeded or failed. The call will return the error code \const{MPI\_ERR\_IN\_STATUS} and the error field of each status returned will be set to indicate success or to indicate the specific error -that occurred. The call will return \const{MPI\_SUCCESS} if no request -resulted in an error, +that \MPIreplace{3.0}{999}{occurred}{was raised}. The call will return \const{MPI\_SUCCESS} if no request +\MPIreplace{3.0}{999}{resulted in}{raised} an error, and will return another error code if it failed for other reasons (such as invalid arguments). In such cases, it will not update the error fields of the statuses. @@ -2773,7 +2773,7 @@ will eventually succeed, unless the send is satisfied by another receive; and similarly for send requests. -Errors that occur during the execution of \mpifunc{MPI\_TESTSOME} are +Errors that \MPIreplace{3.0}{999}{occur}{were raised} during the execution of \mpifunc{MPI\_TESTSOME} are handled as for \linebreak \mpifunc{MPI\_WAITSOME}. \begin{users} Index: chap-dynamic/dynamic-2.tex =================================================================== --- chap-dynamic/dynamic-2.tex (revision 853) +++ chap-dynamic/dynamic-2.tex (working copy) @@ -1806,14 +1806,14 @@ \subsection{Releasing Connections} \label{subsec:disconnect} Before a client and server connect, they are independent -\MPI/ applications. An error in one does not affect the other. +\MPI/ applications. \MPIreplace{3.0}{999}{An error}{A fault} in one does not affect the other. After establishing a connection with \mpifunc{MPI\_COMM\_CONNECT} -and \mpifunc{MPI\_COMM\_ACCEPT}, an error in one may +and \mpifunc{MPI\_COMM\_ACCEPT}, \MPIreplace{3.0}{999}{an error}{a fault} in one may affect the other. It is desirable for a client and server to be -able to disconnect, so that an error in one will +able to disconnect, so that \MPIreplace{3.0}{999}{an error}{a fault} in one will not affect the other. Similarly, it might be desirable for -a parent and child to disconnect, so that errors +a parent and child to disconnect, so that \MPIreplace{3.0}{999}{errors}{faults} in the child do not affect the parent, or vice-versa. \begin{itemize} Index: chap-ei/ei-2.tex =================================================================== --- chap-ei/ei-2.tex (revision 853) +++ chap-ei/ei-2.tex (working copy) @@ -617,8 +617,8 @@ Users are advised not to reuse the status fields for values other than those for which they were intended. Doing so may lead to unexpected results when using the status object. For example, calling -\mpifunc{MPI\_GET\_ELEMENTS} may cause an error if the value is -out of range or it may be impossible to detect such an error. The +\mpifunc{MPI\_GET\_ELEMENTS} may cause an error\MPIupdate{3.0}{999}{to be flagged} if the value is +out of range or it may be impossible to detect such \MPIreplace{3.0}{999}{an error}{a fault}. The \mpiarg{extra\_state} argument provided with a generalized request can be used to return information that does not logically belong in status. Index: chap-one-side/one-side-2.tex =================================================================== --- chap-one-side/one-side-2.tex (revision 853) +++ chap-one-side/one-side-2.tex (working copy) @@ -1779,11 +1779,11 @@ \label{sec:1sided-errhandlers} \subsection{Error Handlers} -Errors occurring +Errors \MPIreplace{3.0}{999}{occurring}{raised} during calls to \mpifunc{MPI\_WIN\_CREATE(...,comm,...)} cause the error handler currently associated with \mpiarg{comm} to be invoked. All other \RMA/ calls have an input \mpiarg{win} argument. When an -error occurs during such a call, the error handler currently +error \MPIreplace{3.0}{999}{occurs}{is raised} during such a call, the error handler currently associated with \mpiarg{win} is invoked. The default error handler associated with \mpiarg{win} is Index: chap-io/io-2.tex =================================================================== --- chap-io/io-2.tex (revision 853) +++ chap-io/io-2.tex (working copy) @@ -258,7 +258,7 @@ The user is responsible for ensuring that a single file is referenced by the \mpiarg{filename} argument, as it may be impossible for an implementation to detect -this type of namespace error. +this type of namespace \MPIreplace{3.0}{999}{error}{fault}. \end{users} Initially, all processes view the file as a linear byte stream, @@ -3550,7 +3550,7 @@ All blocking routines must complete in finite time unless an exceptional condition (such as resource exhaustion) -causes an error. +causes an error\MPIupdate{3.0}{999}{ to be raised}. Nonblocking data access routines inherit the following progress rule from nonblocking point to point communication: @@ -4039,16 +4039,16 @@ \label{sec:io-errhandlers} % The default error handling rule for communication is inappropriate for I/O. -By default, communication errors are fatal---\consti{MPI\_ERRORS\_ARE\_FATAL} +By default, communication \MPIreplace{3.0}{999}{errors}{faults} are fatal---\consti{MPI\_ERRORS\_ARE\_FATAL} is the default error handler associated with \consti{MPI\_COMM\_WORLD}. -I/O errors are usually less catastrophic (e.g., ``file not found'') -than communication errors, -and common practice is to catch these errors and continue executing. +I/O \MPIreplace{3.0}{999}{errors}{faults} are usually less catastrophic (e.g., ``file not found'') +than communication \MPIreplace{3.0}{999}{errors}{faults}, +and common practice is to catch \MPIreplace{3.0}{999}{these errors}{the errors raised by these faults} and continue executing. For this reason, \MPI/ provides additional error facilities for I/O. \begin{users} -\MPI/ does not specify the state of a computation after an erroneous -\MPI/ call has occurred. +\MPI/ does not specify the state of a computation after \MPIreplace{3.0}{999}{an erroneous +\MPI/ call}{a fault} has occurred. A high-quality implementation will support the I/O error handling facilities, allowing users to write programs using common practice for I/O. Index: mpi-sys-macs.tex =================================================================== --- mpi-sys-macs.tex (revision 853) +++ mpi-sys-macs.tex (working copy) @@ -25,7 +25,14 @@ % For publisher's additions in the printed book \newif\ifbookprinting \bookprintingfalse +% Provide a way to indicate whether the text is within a float (such as a +% table); this is necessary to avoid problems with lost floats. Use this +% for macros that use the \ticket command (such as the MPIupdate, MPIreplace, +% or MPIdelete macros) within table for figure environments. +\newif\ifinfloat +\infloatfalse + % % There are a number of features that are controlled by LaTeX if commands. % This step allows you to control these through a configuration file @@ -158,7 +165,7 @@ \def\ticket#1{\relax} \else \ifshowtickets -\def\ticket#1{\ifinner[ticket#1.]\else\protect\marginpar[\mbox{\hbox to \marginparwidth{\hss ticket#1.\hspace{30pt}}}]{\hbox to \marginparwidth{\hspace{30pt}ticket#1.\hss}}\fi} +\def\ticket#1{\ifinfloat[ticket#1.]\else\ifinner[ticket#1.]\else\protect\marginpar[\mbox{\hbox to \marginparwidth{\hss ticket#1.\hspace{30pt}}}]{\hbox to \marginparwidth{\hspace{30pt}ticket#1.\hss}}\fi\fi} \else \def\ticket#1{\relax} \fi % showtickets Index: chap-context/context.tex =================================================================== --- chap-context/context.tex (revision 853) +++ chap-context/context.tex (working copy) @@ -3255,7 +3255,7 @@ \const{MPI\_MAX\_OBJECT\_NAME}. If the user has not associated a name with a communicator, or an error -occurs, \mpifunc{MPI\_COMM\_GET\_NAME} will return an empty string (all +\MPIreplace{3.0}{999}{occurs}{is raised}, \mpifunc{MPI\_COMM\_GET\_NAME} will return an empty string (all spaces in Fortran, {\tt ""} in C and C++). The three predefined communicators will have predefined names associated with them. Thus, the names of \consti{MPI\_COMM\_WORLD}, \consti{MPI\_COMM\_SELF}, and Index: chap-terms/terms-2.tex =================================================================== --- chap-terms/terms-2.tex (revision 853) +++ chap-terms/terms-2.tex (working copy) @@ -1095,14 +1095,14 @@ \MPI/ provides the user with reliable message transmission. A message sent is always received -correctly, and the user does not need to check for transmission errors, +correctly, and the user does not need to check for transmission \MPIreplace{3.0}{999}{errors}{faults}, time-outs, or other error conditions. In other words, \MPI/ does not provide mechanisms for dealing with failures in the communication system. If the \MPI/ implementation is built on an unreliable underlying mechanism, then it is the job of the implementor of the \MPI/ subsystem to insulate the user from this unreliability, or to reflect unrecoverable -errors as failures. +\MPIreplace{3.0}{999}{errors}{faults} as failures. Whenever possible, such failures will be reflected as errors in the relevant communication call. \MPIdelete{3.0}{276}{ @@ -1116,20 +1116,24 @@ The \MPI/ implementation documentation will provide information on the possible effect of each supported class of errors. \begin{users} -It is possible that some error may cause the state of \MPI/ to become corrupted in an undetectable manner. +It is possible that some \MPIreplace{3.0}{999}{error}{fault} may cause the state of \MPI/ to become corrupted in an undetectable manner. The behavior of \MPI/ in this case is undefined, and it is possible that the implementation returns incorrect error codes (including \const{MPI\_SUCCESS}). So while a high-quality implementation will strive to always return correct return codes from \MPI/ operations, it may not be possible in all cases. \end{users} \MPIupdateEnd{3.0} -Of course, \MPI/ programs may still be erroneous. A {\bf program error} can +\MPIupdate{3.0}{999}{In this document we distinguish between {\em +faults} and {\em errors}. An {\em error} is raised when a physical +defect, called a {\em fault}, is detected~\cite{Abd-El-barr:ft:2006}.} + +Of course, \MPI/ programs may still be erroneous. A {\bf program \MPIreplace{3.0}{999}{error}{fault}} can occur when an \MPI/ call is made with an incorrect argument (non-existing destination in a send operation, buffer too small in a receive operation, etc.). -This type of error would occur in any implementation. -In addition, a {\bf resource error} may occur when a program exceeds the amount +This type of \MPIreplace{3.0}{999}{error}{fault} would occur in any implementation. +In addition, a {\bf resource \MPIreplace{3.0}{999}{error}{fault}} may occur when a program exceeds the amount of available system resources (number of pending messages, system buffers, -etc.). The occurrence of this type of error depends on the amount of +etc.). The occurrence of this type of \MPIreplace{3.0}{999}{error}{fault} depends on the amount of available resources in the system and the resource allocation mechanism used; this may differ from system to system. A high-quality implementation will provide generous limits on the important @@ -1138,8 +1142,8 @@ In C and Fortran, almost all \MPI/ calls return a code that indicates successful completion of the operation. Whenever possible, \MPI/ -calls return an error code if an error occurred during the call. By -default, an error detected during the execution of the \MPI/ library +calls return an error code if \MPIreplace{3.0}{999}{an error}{a fault} occurred during the call. By +default, \MPIreplace{3.0}{999}{an error}{a fault} detected during the execution of the \MPI/ library causes the parallel computation to abort, except for file operations. However, \MPI/ provides @@ -1164,9 +1168,9 @@ See also Section~\ref{sec:bindings-c++exceptions} on page~\pageref{sec:bindings-c++exceptions}. Several factors limit the ability of \MPI/ calls to return with meaningful error -codes when an error occurs. -\MPI/ may not be able to detect some errors; other errors may be too -expensive to detect in normal execution mode; finally some errors may be +codes when \MPIreplace{3.0}{999}{an error}{a fault} occurs. +\MPI/ may not be able to detect some \MPIreplace{3.0}{999}{errors}{faults}; other \MPIreplace{3.0}{999}{errors}{faults} may be too +expensive to detect in normal execution mode; finally some \MPIreplace{3.0}{999}{errors}{faults} may be ``catastrophic'' and may prevent \MPI/ from returning control to the caller in a consistent state. @@ -1179,16 +1183,17 @@ operation (e.g., a call that verifies that an asynchronous operation has completed) then the error argument associated with this call will be used to indicate the nature -of the error. In a few cases, the error may occur after all calls that +of the \MPIreplace{3.0}{999}{error}{fault}. In a few cases, the \MPIreplace{3.0}{999}{error}{fault} may occur after all calls that relate to the operation have completed, so that no error value can be used -to indicate the nature of the error (e.g., an error on the receiver in -a send with the ready mode). Such an error must be treated as fatal, since +to indicate the nature of the \MPIreplace{3.0}{999}{error}{fault} +(e.g., \MPIreplace{3.0}{999}{an error}{a fault} on the receiver in +a send with the ready mode). Such \MPIreplace{3.0}{999}{an error}{a fault} must be treated as fatal, since information cannot be returned for the user to recover from it. This document does not specify the state of a computation after an erroneous \MPI/ call has occurred. The desired behavior is that a relevant error code be returned, and the effect -of the error be localized to the greatest possible extent. E.g., it is highly +of the \MPIreplace{3.0}{999}{error}{fault} be localized to the greatest possible extent. E.g., it is highly desirable that an erroneous receive call will not cause any part of the receiver's memory to be overwritten, beyond the area specified for receiving the message. @@ -1269,7 +1274,7 @@ \end{verbatim} In addition, calls that fail because of resource exhaustion or other -error are not considered a violation of the requirements here (however, +\MPIreplace{3.0}{999}{error}{fault} are not considered a violation of the requirements here (however, they are required to complete, just not to complete successfully). %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Index: chap-topol/topol.tex =================================================================== --- chap-topol/topol.tex (revision 853) +++ chap-topol/topol.tex (working copy) @@ -271,7 +271,7 @@ only those entries where {\tt dims[i] = 0} are modified by the call. Negative input values of {\tt dims[i]} are erroneous. -An error will occur if {\tt nnodes} is not a multiple of +An error will \MPIreplace{3.0}{999}{occur}{be raised} if {\tt nnodes} is not a multiple of $\displaystyle \prod_{i, dims[i]\neq 0} dims[i]$. For {\tt dims[i]} set by the call, {\tt dims[i]} will be ordered in Index: chap-inquiry/inquiry.tex =================================================================== --- chap-inquiry/inquiry.tex (revision 853) +++ chap-inquiry/inquiry.tex (working copy) @@ -373,15 +373,15 @@ \section{Error Handling} \label{sec:errorhandler} -An \MPI/ implementation cannot or may choose not to handle some errors -that occur during \MPI/ calls. These can include errors that generate -exceptions or traps, such as floating point errors or access +An \MPI/ implementation cannot or may choose not to handle some \MPIreplace{3.0}{999}{errors}{faults} +that occur during \MPI/ calls. These can include \MPIdelete{3.0}{999}{errors that generate} +exceptions or traps, such as floating point \MPIreplace{3.0}{999}{errors}{exceptions} or access violations. -The set of errors that are handled by \MPI/ is implementation-dependent. -Each such error generates an {\bf \MPI/ exception}. +The set of \MPIreplace{3.0}{999}{errors}{faults} that are handled by \MPI/ is implementation-dependent. +Each such \MPIreplace{3.0}{999}{error}{fault} generates an {\bf \MPI/ exception}. -The above text takes precedence over any text on error handling within this -document. Specifically, text that states that errors {\em will} be handled +The above text takes precedence over any text on \MPIreplace{3.0}{999}{error}{fault} handling within this +document. Specifically, text that states that \MPIreplace{3.0}{999}{errors}{faults} {\em will} be handled should be read as {\em may} be handled. %A user can associate an error handler with a communicator. The @@ -432,22 +432,24 @@ convenient and more efficient not to test for errors after each \MPI/ call, and have such error handled by a non trivial \MPI/ error handler. -After an error is detected, the state of \MPI/ is undefined. That is, using +After \MPIreplace{3.0}{999}{an error}{a fault (other than those + explicitly described in Chapter~\ref{chap:ft})} is detected, the state of \MPI/ is undefined. That is, using a user-defined error handler, or \const{MPI\_ERRORS\_RETURN}, does {\em not\/} necessarily -allow the user to continue to use \MPI/ after an error is detected. The purpose +allow the user to continue to use \MPI/ after +\MPIreplace{3.0}{999}{an error}{a fault} is detected. The purpose of these error handlers is to allow a user to issue user-defined error messages and to take actions unrelated to \MPI/ (such as flushing I/O buffers) before a program exits. An \MPI/ implementation is free to allow \MPI/ to continue after -an error but is not required to do so. +\MPIreplace{3.0}{999}{an error}{a fault} but is not required to do so. \begin{implementors} A good quality implementation will, to the greatest possible extent, -circumscribe the impact of an error, so that normal processing can +circumscribe the impact of \MPIreplace{3.0}{999}{an error}{a fault}, so that normal processing can continue after an error handler was invoked. The implementation documentation will -provide information on the possible effect of each class of errors. +provide information on the possible effect of each class of \MPIreplace{3.0}{999}{errors}{faults}. \end{implementors} An \MPI/ error handler is an opaque object, which is accessed by a handle. @@ -946,6 +948,7 @@ \end{table} \begin{table}[htb] +\infloattrue \begin{center} \begin{tabular}{l p{2.8in}} @@ -985,7 +988,7 @@ representation identifier that was already defined was passed to \func{MPI\_REGISTER\_DATAREP}\\ \error{MPI\_ERR\_CONVERSION} & - An error occurred in a user supplied data conversion + An error \MPIreplace{3.0}{999}{occurred}{was raised} in a user supplied data conversion function.\\ \error{MPI\_ERR\_IO} & Other I/O error\\ @@ -997,6 +1000,7 @@ Error classes (Part 2) } \label{table:inquiry:errclasses:part:ii} +\infloatfalse \end{table} The error classes are a subset of the error codes: an \MPI/ function Index: chap-ft/ft.tex =================================================================== --- chap-ft/ft.tex (revision 853) +++ chap-ft/ft.tex (working copy) @@ -25,7 +25,7 @@ The default error handler is \const{MPI\_ERRORS\_ARE\_FATAL}, as defined in Section~\ref{sec:errorhandler}. \begin{implementors} -If the default error handler is not replaced by the application then many of the functions and semantics in this chapter may be avoided as they focus upon the continued use of the \MPI/ interface after an error. +If the default error handler is not replaced by the application then many of the functions and semantics in this chapter may be avoided as they focus upon the continued use of the \MPI/ interface after \MPIreplace{3.0}{999}{an error}{a fault}. Such a situation is not possible given the default error handler of \const{MPI\_ERRORS\_ARE\_FATAL} on \const{MPI\_COMM\_WORLD}. If an implementation cannot provide the necessary functionality described in this chapter then it should return \error{MPI\_ERR\_UNSUPPORTED\_OPERATION} for those operations defined in this chapter, and never return the error class \error{MPI\_ERR\_RANK\_FAIL\_STOP} from any \MPI/ operation. \end{implementors} @@ -40,9 +40,9 @@ \begin{description} % -\item[{\bf error}] In a single system, an error is generated when a physical defect, called a fault, is detected~\cite{Abd-El-barr:ft:2006}. +\item[{\bf error}] In a single system, an error is \MPIreplace{3.0}{999}{generated}{raised} when a physical defect, called a fault, is detected~\cite{Abd-El-barr:ft:2006}. % -\item[{\bf failure}] A system failure is when the system cannot deliver its intended function because of one or more errors~\cite{Abd-El-barr:ft:2006}. +\item[{\bf failure}] A system failure is when the system cannot deliver its intended function because of one or more \MPIreplace{3.0}{999}{errors}{faults}~\cite{Abd-El-barr:ft:2006}. % A fault-tolerant system will continue to operate normally in the presence of errors. % \item[{\bf fail-stop process failure}] A process failure in which the \MPI/ process is permanently stopped, often due to a component crash. @@ -880,7 +880,7 @@ \label{sec:ft-env:init-finalize} %\begin{implementors} -If a process failure or other error occurs before or during \mpifunc{MPI\_INIT} then \mpifunc{MPI\_INIT} should try to return an error code, and not abort by default. +If a process failure or other \MPIreplace{3.0}{999}{error}{fault} occurs before or during \mpifunc{MPI\_INIT} then \mpifunc{MPI\_INIT} should try to return an error code, and not abort by default. If the next \MPI/ operation is not \mpifunc{MPI\_COMM\_SET\_ERRHANDLER} (or \mpifunc{MPI\_COMM\_CREATE\_ERRHANLDER} followed by \mpifunc{MPI\_COMM\_SET\_ERRHANDLER}) then the \MPI/ implementation will behave as if \const{MPI\_ERRORS\_ARE\_FATAL} was set on \const{MPI\_COMM\_WORLD}. %\end{implementors} @@ -891,7 +891,7 @@ \begin{implementors} A good quality implementation will, to the greatest possible extent, return an appropriate error code and not abort if \mpifunc{MPI\_INIT} is not able to complete successfully. -A critical error may cause even a good quality \MPI/ implementation to abort before or during \mpifunc{MPI\_INIT}. +A critical \MPIreplace{3.0}{999}{error}{fault} may cause even a good quality \MPI/ implementation to abort before or during \mpifunc{MPI\_INIT}. \end{implementors} \par @@ -935,7 +935,7 @@ \begin{rationale} An application may determine that a peer process is faulty in a way that cannot be detected by the \MPI/ implementation. -For example, the faulty peer is generating invalid results indicating the effect of a soft error. +For example, the faulty peer is generating invalid results indicating the effect of a soft \MPIreplace{3.0}{999}{error}{fault}. An alive process may be able to detect this situation and needs a way to forcibly terminate the faulty process without terminating itself. Simply ignoring the faulty process (generating new communication contexts for all correct processes) is not an acceptable solution. In this situation the faulty process could block sending or receiving with a correct process, and never call \mpifunc{MPI\_FINALIZE}.