[mpiwg-ft] Proof reading the draft

Thomas Naughton naughtont at ornl.gov
Tue Feb 11 17:40:21 CST 2014


Hello,

I read through the current proposal and fixed a few minor typos/wording
items.  I don't have write access to the repo so attached is a diff of the
changes from my local copy.

Also, in Example 17.3:
   - I wasn't exactly sure what side effect the comment
     before the MPI_Comm_failure_ack() was describing.
     It might be good to improve that comment for clarity.
   - Should the "T" variable passed to agree be re-initialized
     inside the do-while loop?

Otherwise, it looks good to me.

Thanks,
--tjn

  _________________________________________________________________________
   Thomas Naughton                                      naughtont at ornl.gov
   Research Associate                                   (865) 576-4184


On Fri, 7 Feb 2014, Aurélien Bouteiller wrote:

> Dear WG members,
>
> I would like to engage you into provisioning a little bit of proof reading time on monday/tuesday.
>
> We have reached semantic freeze immediately after the december meeting, and we are getting very close to freeze wording too at this point. The current plan is to have all WG edits done by tuesday evening, so that we accept only orthographic edits and comments from external reviewers until the final closure date in a week.
>
> Aurelien
>
> PS: if you do not have access yet to the working repository, kick me an email with your bitbucket ID. It is not planned to update the publicly visible draft before tuesday evening.
>
>
> _______________________________________________
> mpiwg-ft mailing list
> mpiwg-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-ft
>
-------------- next part --------------
comparing with https://bitbucket.org/bosilca/mpi3ft
http authorization required
realm: Bitbucket.org HTTP
user: naughtont3
searching for changes
changeset:   170:4c1b718339c5
tag:         tip
user:        Thomas Naughton <naughtont at ornl.gov>
date:        Tue Feb 11 18:26:13 2014 -0500
summary:     fix a few typos and wording issues in ft chapter

diff -r 1577ad6aa6bb -r 4c1b718339c5 chap-ft/ft.tex
--- a/chap-ft/ft.tex	Thu Feb 06 18:50:03 2014 -0500
+++ b/chap-ft/ft.tex	Tue Feb 11 18:26:13 2014 -0500
@@ -95,7 +95,7 @@
 \par An operation involving a failed process must always complete in
 a finite amount of time (possibly by raising a process failure
 exception). If an operation does not involve a failed process (such
-as a point-to-point message between two nonfailed processes), it
+as a point-to-point message between two non-failed processes), it
 must not raise a process failure exception.
 
 \begin{implementors}
@@ -177,7 +177,9 @@
 
 When a collective operation cannot be completed because of the failure of
 an involved process, the collective operation raises an exception of class
-\error{MPI\_ERR\_PROC\_FAILED}.
+\error{MPI\_ERR\_PROC\_FAILED} 
+(or \error{MPI\_ERR\_REVOKED} as defined in
+Section~\ref{sec:ft-functions:commfunctions}).
 
 \begin{users}
 
@@ -208,7 +210,7 @@
 between processes that succeeded in creating the new communicator, the user is
 responsible for ensuring a consistent view of the communicator creation, if needed.
 %
-A conservative solution is check the global outcome of the
+A conservative solution is to check the global outcome of the
 communicator creation function with \mpifunc{MPI\_COMM\_AGREE}
 (defined in Section~\ref{sec:ft-functions:commfunctions}), as
 illustrated in Example~\ref{ft-ex-commsplit}.
@@ -443,7 +445,7 @@
 
 \par
 %
-This function never raise an exception of class \error{MPI\_ERR\_PROC\_FAILED} or
+This function will never raise an exception of class \error{MPI\_ERR\_PROC\_FAILED} or
 \error{MPI\_ERR\_REVOKED}. All processes agree to exclude the rank
 of failed processes from the group of \mpiarg{newcomm}. At least every process
 whose failure raised an \MPI/ exception of class
@@ -481,10 +483,11 @@
 \begin{users} Calling \mpifunc{MPI\_COMM\_FAILURE\_ACK} on a
 communicator with failed processes has no effect on collective
 operations (except for \mpifunc{MPI\_COMM\_AGREE}). If a collective
-operation would raise an exception due to the communicator spanning
-on a failed process (as defined in
+operation would raise an exception due to the communicator 
+%spanning on
+containing a failed process (as defined in
 Section~\ref{sec:ft-notification:p2p-coll-comm}), it can continue to
-do so even after the failure has been acknowledged. In order to resume
+raise an exception even after the failure has been acknowledged. In order to resume
 using collective operations when a communicator contains failed
 processes, users should create a new communicator by using
 \mpifunc{MPI\_COMM\_SHRINK}. \end{users}
@@ -521,7 +524,7 @@
 %
 This function performs a collective operation on the group of living
 processes in \mpiarg{comm}. On completion, all living processes agree
-to set the output value of \mpiarg{flag} to the result of a logical
+to set the output value of a boolean \mpiarg{flag} to the result of a logical
 \textit{'AND'} operation over the contributed input values of
 \mpiarg{flag}. If \mpiarg{comm} is an intercommunicator, the value of
 \mpiarg{flag} is a logical \textit{'AND'} operation over the values
@@ -766,21 +769,22 @@
 %%ENDHEADER
 \begin{Verbatim}
 Comm_failure_allget(MPI_Comm c, MPI_Group * g) {
-	MPI_Comm s; MPI_Group c_grp, s_grp;
+    MPI_Comm s; MPI_Group c_grp, s_grp;
 
     /* Using shrink to create a new communicator, the underlying 
      * group is necessarily consistent across all ranks, and excludes 
      * all processes detected to have failed before the call */
     MPI_Comm_shrink(c, &s);
     /* Extracting the groups from the communicators */
-	MPI_Comm_group(c, &c_grp);
+    MPI_Comm_group(c, &c_grp);
     MPI_Comm_group(s, &s_grp);
     /* s_grp is the group of still alive processes, we want to 
      * return the group of failed processes. */
     MPI_Group_diff(c_grp, s_grp, g);
-	
-    MPI_Group_free(&c_grp); MPI_Group_free(&s_grp);
-	MPI_Comm_free(&s);
+
+    MPI_Group_free(&c_grp); 
+    MPI_Group_free(&s_grp);
+    MPI_Comm_free(&s);
 }
 \end{Verbatim}
 }
@@ -802,12 +806,12 @@
 %%ENDHEADER
 \begin{Verbatim}
 Comm_failure_allget2(MPI_Comm c, MPI_Group * g) {
-	int rc; int T=1;
+    int rc; int T=1;
 
     do { 
         /* beware, this has side effects on the state of comm */
         MPI_Comm_failure_ack(comm); 
-        rc=MPI_Comm_agree(comm, &T);
+        rc = MPI_Comm_agree(comm, &T);
     } while( rc != MPI_SUCCESS );
     /* after this loop, all processes see the same failure set: 
      * MPI_Comm_agree has returned MPI_SUCCESS at all ranks, so 
@@ -823,7 +827,7 @@
 
 \subsection{Fault-Tolerant Master/Worker}
 
-The example below presents a master code that handles workers failure
+The example below presents a master code that handles worker failures
 by discarding failed worker processes and resubmitting the work to
 the remaining workers. It demonstrates the different failure cases
 that may occur when posting receptions from \const{MPI\_ANY\_SOURCE}
@@ -861,7 +865,7 @@
     while( (active_workers > 0) && work_available ) {
         rc = MPI_Wait( &req, &status );
 
-        if( (MPI_ERR_PROC_FAILED == rc) || (MPI_ERR_FAILURE_PENDING == rc) ) {
+        if( (MPI_ERR_PROC_FAILED == rc) || (MPI_ERR_FAILED_PENDING == rc) ) {
             MPI_Comm_failure_ack(comm);
             MPI_Comm_failure_get_acked(comm, &g);
             MPI_Group_size(g, &gsize);
@@ -945,7 +949,8 @@
         if( rc == MPI_ERR_PROC_FAILED || !allsucceeded ) {
             MPI_Comm_shrink(comm, &comm2);
             MPI_Comm_free(comm); /* Release the revoked communicator */
-            comm = comm2; gnorm = epsilon + 1.0; /* Force one more iteration */
+            comm = comm2; 
+            gnorm = epsilon + 1.0; /* Force one more iteration */
         }
     }
 }



More information about the mpiwg-ft mailing list