<font size=2 face="sans-serif">Thanks for the feedback from everyone.
It is interesting to read over the various approaches. I
was attempting to see if I could distill any general rules from the examples.
I like the guideline of "Don't use MPI_Comm_agreement in the
non-error case", though even George's example will call MPI_Comm_agreement
when there are no errors. It is clearly possible to use MPI_Comm_agreement
in non-failures cases, but it still seems like a good guideline.
An obvious rule is "Don't write code where one rank can possibly
call MPI_Comm_shrink while another calls MPI_Comm_agreement".
I'm still wrestling with whether MPI_Comm_agreement is necessary
to write nontrivial FT code or if it is simply a convenience that makes
writing FT code simpler. It seems to me that you can write some
simple infinite code loops that error check MPI calls but do not use MPI_Comm_agreement,
but before a rank makes any significant transition within the code (for
example, calling MPI_Finalize or leaving a library), a call to MPI_Comm_agreement
seems necessary. I'm assuming in all these cases that the application
calls collectives. Anyhow, thanks again for the code/comments.
As to Josh's question of can you ever call shrink or agreement in
a callback, I believe that would be very difficult to do safely, especially
if the application is known to make direct calls to either of those routines.</font>
<br>
<br><font size=2 face="sans-serif">Thanks,<br>
Dave</font>
<br>
<br>
<br>
<br><font size=1 color=#5f5f5f face="sans-serif">From:
</font><font size=1 face="sans-serif">George Bosilca <bosilca@eecs.utk.edu></font>
<br><font size=1 color=#5f5f5f face="sans-serif">To:
</font><font size=1 face="sans-serif">"MPI 3.0 Fault
Tolerance and Dynamic Process Control working Group" <mpi3-ft@lists.mpi-forum.org></font>
<br><font size=1 color=#5f5f5f face="sans-serif">Date:
</font><font size=1 face="sans-serif">04/09/2012 11:41 AM</font>
<br><font size=1 color=#5f5f5f face="sans-serif">Subject:
</font><font size=1 face="sans-serif">Re: [Mpi3-ft]
Using MPI_Comm_shrink and MPI_Comm_agreement in the
same application</font>
<br><font size=1 color=#5f5f5f face="sans-serif">Sent by:
</font><font size=1 face="sans-serif">mpi3-ft-bounces@lists.mpi-forum.org</font>
<br>
<hr noshade>
<br>
<br>
<br><tt><font size=2>Dave,<br>
<br>
The MPI_Comm_agree is meant to be used in case of failure. It has a significant
cost, large enough not to force it on users in __any__ case.<br>
<br>
Below you will find the FT version of your example. We started from the
non fault tolerant version, and added what was required to make it fault
tolerant.<br>
<br>
george.<br>
<br>
<br>
<br>
success = false;<br>
do {<br>
MPI_Comm_size(comm, &size); <br>
MPI_Comm_rank(comm, &rank);<br>
root = (0 == rank);<br>
do {<br>
if (root) read_some_data_from_a_file(buffer);
<br>
<br>
rc = MPI_Bcast(buffer, .... ,root,
comm);<br>
if( MPI_SUCCESS != rc ) { /*
check only for FT type of errors */<br>
MPI_Comm_revoke(comm);<br>
break;<br>
}<br>
<br>
done = do_computation(buffer,
size); <br>
<br>
rc = MPI_Allreduce( &done,
&success, ... MPI_OP_AND, comm );<br>
if( MPI_SUCCESS != rc ) { /*
check only for FT type of errors */<br>
success = false;
/* not defined if MPI_Allreduce failed */<br>
MPI_Comm_revoke(comm);<br>
break;<br>
}<br>
} while(false == success);<br>
MPI_Comm_agree( comm, &success );<br>
if( false == success ) {<br>
MPI_Comm_revoke(comm);<br>
MPI_Comm_shrink(comm, &newcomm);<br>
MPI_Comm_free(comm);<br>
comm = newcomm;<br>
}<br>
} while (false == success);<br>
<br>
<br>
<br>
<br>
_______________________________________________<br>
mpi3-ft mailing list<br>
mpi3-ft@lists.mpi-forum.org<br>
</font></tt><a href="http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft"><tt><font size=2>http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft</font></tt></a><tt><font size=2><br>
<br>
</font></tt>
<br>