<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.2900.3354" name=GENERATOR><!-- converted from rtf -->
<STYLE>.EmailQuote {
PADDING-LEFT: 4pt; MARGIN-LEFT: 1pt; BORDER-LEFT: #800000 2px solid
}
</STYLE>
</HEAD>
<BODY>
<DIV dir=ltr align=left><FONT face=Arial size=2><SPAN
class=317352210-18022009>Dear Erez,</SPAN></FONT></DIV>
<DIV dir=ltr align=left><FONT face=Arial size=2><SPAN
class=317352210-18022009></SPAN></FONT> </DIV>
<DIV dir=ltr align=left><FONT face=Arial size=2><SPAN
class=317352210-18022009>Thank you. A couple of questions:</SPAN></FONT></DIV>
<DIV dir=ltr align=left><FONT face=Arial size=2><SPAN
class=317352210-18022009></SPAN></FONT> </DIV>
<DIV dir=ltr align=left><FONT face=Arial size=2><SPAN
class=317352210-18022009>1. You seem to restrict communication to pt2pt only.
Why? A Bcast upfront could be useful, for one.</SPAN></FONT></DIV>
<DIV dir=ltr align=left><FONT face=Arial size=2><SPAN
class=317352210-18022009>2. I can imagine more complicated communicator
combinations than only MPI_COMM_WORLD. Why do we require one
communicator?</SPAN></FONT></DIV>
<DIV dir=ltr align=left><FONT face=Arial size=2><SPAN
class=317352210-18022009>3. It appears that failed slaves cannot be simply
respawned. Is this what a repair would do anyway?</SPAN></FONT></DIV>
<DIV dir=ltr align=left><FONT face=Arial size=2><SPAN
class=317352210-18022009></SPAN></FONT> </DIV>
<DIV dir=ltr align=left><FONT face=Arial size=2><SPAN
class=317352210-18022009>Best regards.</SPAN></FONT></DIV>
<DIV dir=ltr align=left><FONT face=Arial size=2><SPAN
class=317352210-18022009></SPAN></FONT> </DIV>
<DIV dir=ltr align=left><FONT face=Arial size=2><SPAN
class=317352210-18022009>Alexander</SPAN></FONT></DIV><BR>
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> mpi3-ft-bounces@lists.mpi-forum.org
[mailto:mpi3-ft-bounces@lists.mpi-forum.org] <B>On Behalf Of </B>Erez
Haba<BR><B>Sent:</B> Wednesday, February 18, 2009 3:53 AM<BR><B>To:</B> MPI 3.0
Fault Tolerance and Dynamic Process Control working Group<BR><B>Subject:</B>
[Mpi3-ft] MPI Fault Tolerance scenarios<BR></FONT><BR></DIV>
<DIV></DIV><FONT face="Calibri, sans-serif" size=2>
<DIV>Hello all,</DIV>
<DIV> </DIV>
<DIV>In our last meeting we decided to build a set of FT scenarios/programs to
help us understand the details of the interface need to support those scenarios.
We also decided to start with very simple scenarios and add more complex ones as
we understand the former better. I hope that starting with simple
scenarios will help us build a solid foundation on which we can build the more
complex solutions.</DIV>
<DIV> </DIV>
<DIV>When we build an FT solution we will focus on the scenario as described,
without complicating the solution just because it would be needed later for a
more complex one. The time will come later to modify the solution as we acquire
more knowledge and built the foundations. Hence, any proposal or change that we
make needs to fit <U><I>exactly</I></U> the scenario (and all those that we
previously looked at) but no more.</DIV>
<DIV>For example in the first scenario that we’ll look at there is no need for
saving communicator state or error callback; but they might be required
later.</DIV>
<DIV> </DIV>
<DIV>Note that these scenarios focus on process FT rather than
checkpoint/restart or network degradation. I assume we’ll do the latter
later.</DIV>
<DIV style="MARGIN-TOP: 24pt"><FONT face="Cambria, serif" color=#365f91
size=4><B>Scenario #1: </B><B>Very </B><B>Simple
Master</B><B>-</B><B>Workers</B></FONT></DIV>
<DIV style="MARGIN-TOP: 10pt"><FONT face="Cambria, serif" color=#4f81bd
size=3><B>Description</B></FONT></DIV>
<DIV>This is a very simple master-workers scenario. However simple, we were
asked many times by customers to support FT in this scenario.</DIV>
<DIV>In this case the MPI application running with n processes, where rank 0 is
used as the master and n-1 ranks are used as workers. The master generates
work (either by getting it directly from user input, or reading a file) and
sends it for processing to a free worker rank. The master sends requests and
receives replies using MPI point-to-point communication. The workers wait
for the incoming message, upon arrival the worker computes the result and sends
it back to the master. The master stores the result to a log file.</DIV>
<DIV> </DIV>
<DIV><B>Hardening</B>: The goal is to harden the workers, the master itself is
not FT, thus if it fails the entire application fails. In this case the workers
are FT, and are replaced to keep computation power for this application. (a
twist: if a worker cannot be recovered the master can work with a smaller set of
clients up to a low watermark).</DIV>
<DIV style="MARGIN-TOP: 10pt"><FONT face="Cambria, serif" color=#4f81bd
size=3><B>Worker</B></FONT></DIV>
<DIV>The worker waits on a blocking receive when a message arrives it process
it. If a <I>done</I> message arrives the worker finalizes MPI and exit
normally.</DIV>
<DIV> </DIV>
<DIV><B>Hardening</B>: There is not special requirement for hardening here. If
the worker encounters a communication problem with the master, it means that the
master is down and it’s okay to abort the entire job. Thus, it will use the
default error handler (which aborts on errors). Note that we do not need
to modify the client at all to make the application FT (except the
master).</DIV>
<DIV> </DIV>
<DIV>Pseudo code for the hardened worker:</DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
color=#548dd4 size=2><SPAN style="BACKGROUND-COLOR: #ffffff">int</SPAN><FONT
color=#000000><SPAN style="BACKGROUND-COLOR: #ffffff">
main()</SPAN></FONT></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffffff">{</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffffff"> </SPAN><FONT
color=#984806><SPAN
style="BACKGROUND-COLOR: #ffffff">MPI_Init</SPAN></FONT><SPAN
style="BACKGROUND-COLOR: #ffffff">()</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT
face=Arial></FONT> </DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffffff"> </SPAN><FONT
color=#548dd4><SPAN style="BACKGROUND-COLOR: #ffffff">for</SPAN></FONT><SPAN
style="BACKGROUND-COLOR: #ffffff">(;;)</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffffff">
{</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN
style="BACKGROUND-COLOR: #ffffff">
</SPAN><FONT color=#984806><SPAN
style="BACKGROUND-COLOR: #ffffff">MPI_Recv</SPAN></FONT><SPAN
style="BACKGROUND-COLOR: #ffffff">(src=0, </SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">&query</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">, MPI_COMM_WORLD</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">)</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">;</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN
style="BACKGROUND-COLOR: #ffffff">
</SPAN><FONT color=#548dd4><SPAN
style="BACKGROUND-COLOR: #ffffff">if</SPAN></FONT><SPAN
style="BACKGROUND-COLOR: #ffffff">(</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">is_done_msg(</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">query</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">)</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">)</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN
style="BACKGROUND-COLOR: #ffffff">
</SPAN><SPAN style="BACKGROUND-COLOR: #ffffff"><B>break</B></SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">;</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "> </DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN
style="BACKGROUND-COLOR: #ffffff">
</SPAN><SPAN style="BACKGROUND-COLOR: #ffffff">process_query</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">(</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">&query</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">, </SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">&answer</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">)</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">;</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "> </DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN
style="BACKGROUND-COLOR: #ffffff">
</SPAN><FONT color=#984806><SPAN
style="BACKGROUND-COLOR: #ffffff">MPI_Send</SPAN></FONT><SPAN
style="BACKGROUND-COLOR: #ffffff">(dst=0, </SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">&answer</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">, MPI_COMM_WORLD);</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffffff">
}</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2></FONT> </DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffffff"> </SPAN><FONT
color=#984806><SPAN
style="BACKGROUND-COLOR: #ffffff">MPI_Finalize</SPAN></FONT><SPAN
style="BACKGROUND-COLOR: #ffffff">()</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffffff">}</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "> </DIV>
<DIV> </DIV>
<DIV>Notice that for this FT code there is no requirement for the worker to
rejoin the comm. As the only communicator used is MPI_COMM_WORLD.</DIV>
<DIV> </DIV>
<DIV style="MARGIN-TOP: 10pt"><FONT face="Cambria, serif" color=#4f81bd
size=3><B>Master</B></FONT></DIV>
<DIV>The master code reads queries from a stream and passes them on to the
workers to process. The master goes through several phases. In the
initialization phase it sends the first request to each one of the ranks; in the
second one it shuts down any unnecessary ranks (if the job is too small); I the
third phase it enters its progress engine where it handles replies (answers),
process recovery and termination (on input end).</DIV>
<DIV> </DIV>
<DIV><B>Hardening</B>: It is the responsibility of the master to restart any
failing workers and make sure that the request (query) did not get lost if a
worker fails. Hence, every time an error is detected the master will move the
worker into repairing state and move its workload to other workers.</DIV>
<DIV>The master runs with errors returned rather than aborted</DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><I>One thing to note about </I><I>the following </I><I>code</I><I>:</I><I>
</I><I>i</I><I>t is not </I><I>optimized</I><I>.</I><I> I did not try to
</I><I>overlap</I><I> </I><I>computation</I><I> with communication
(</I><I>which</I><I> is </I><I>possible</I><I>) I tried to keep i</I><I>t as
simple as </I><I>possible</I><I> for the purpose of discussion.</I></DIV>
<DIV> </DIV>
<DIV>Pseudo code for the hardened master; the code needed for repairing the
failed ranks is highlighted in yellow.</DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
color=#548dd4 size=2><SPAN style="BACKGROUND-COLOR: #ffffff">int</SPAN><FONT
color=#000000><SPAN style="BACKGROUND-COLOR: #ffffff">
main()</SPAN></FONT></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffffff">{</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffffff"> </SPAN><FONT
color=#984806><SPAN
style="BACKGROUND-COLOR: #ffffff">MPI_Init</SPAN></FONT><SPAN
style="BACKGROUND-COLOR: #ffffff">()</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffff00">>>></SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00"> </SPAN><FONT color=#984806><SPAN
style="BACKGROUND-COLOR: #ffff00">MPI_Comm_set_errhandler</SPAN></FONT><SPAN
style="BACKGROUND-COLOR: #ffff00">(</SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00">MPI_COMM_WORLD</SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00">, </SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00">MPI_ERRORS_RETURN</SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00">);</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffffff"> </SPAN><FONT
color=#984806><SPAN
style="BACKGROUND-COLOR: #ffffff">MPI_Comm_size</SPAN></FONT><SPAN
style="BACKGROUND-COLOR: #ffffff">(MPI_COMM_WORLD, &n);</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "> </DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffffff">
MPI_Request</SPAN><SPAN style="BACKGROUND-COLOR: #ffffff"> r[n]</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff"> = MPI_REQUEST_NULL</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">;</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffffff"> </SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">Query</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">Message </SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">q</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">[n];</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffffff"> AnswerMessage
a[n];</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffffff"> </SPAN><FONT
color=#548dd4><SPAN style="BACKGROUND-COLOR: #ffffff">i</SPAN></FONT><FONT
color=#548dd4><SPAN style="BACKGROUND-COLOR: #ffffff">nt</SPAN></FONT><SPAN
style="BACKGROUND-COLOR: #ffffff"> </SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">active_workers</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff"> = </SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">0</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">;</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "> </DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffff00">>>></SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00"> </SPAN><FONT color=#548dd4><SPAN
style="BACKGROUND-COLOR: #ffff00">bool</SPAN></FONT><SPAN
style="BACKGROUND-COLOR: #ffff00"> repairing[n] = false;</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "> </DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffffff">
//</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffffff"> // Phase 1:
send </SPAN><SPAN style="BACKGROUND-COLOR: #ffffff">initial</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff"> requests</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffffff">
//</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffffff"> </SPAN><FONT
color=#548dd4><SPAN style="BACKGROUND-COLOR: #ffffff">for</SPAN></FONT><SPAN
style="BACKGROUND-COLOR: #ffffff">(</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">int </SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">i</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff"> = 1</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">;</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff"> </SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">i</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff"> < n</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">;</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff"> i++</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">)</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffffff">
{</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN
style="BACKGROUND-COLOR: #ffffff">
</SPAN><FONT color=#548dd4><SPAN
style="BACKGROUND-COLOR: #ffffff">if</SPAN></FONT><SPAN
style="BACKGROUND-COLOR: #ffffff">(</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">get_next_</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">query</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">(</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">stream, </SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">&q[i]</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">)</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff"> == eof)</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN
style="BACKGROUND-COLOR: #ffffff">
</SPAN><FONT color=#548dd4><SPAN
style="BACKGROUND-COLOR: #ffffff">break</SPAN></FONT><SPAN
style="BACKGROUND-COLOR: #ffffff">;</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "> </DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN
style="BACKGROUND-COLOR: #ffffff">
</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">active_workers++;</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN
style="BACKGROUND-COLOR: #ffffff">
</SPAN><FONT color=#984806><SPAN
style="BACKGROUND-COLOR: #ffffff">MPI_</SPAN></FONT><FONT color=#984806><SPAN
style="BACKGROUND-COLOR: #ffffff">S</SPAN></FONT><FONT color=#984806><SPAN
style="BACKGROUND-COLOR: #ffffff">end</SPAN></FONT><SPAN
style="BACKGROUND-COLOR: #ffffff">(</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">dest</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">=</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">i</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">, </SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">&</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">q[i]</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">, MPI_COMM_WORLD</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">)</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">;</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN
style="BACKGROUND-COLOR: #ffffff">
</SPAN><SPAN style="BACKGROUND-COLOR: #ffffff">rc = </SPAN><FONT
color=#984806><SPAN
style="BACKGROUND-COLOR: #ffffff">MPI_Irecv</SPAN></FONT><SPAN
style="BACKGROUND-COLOR: #ffffff">(src</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">=</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">i</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">, </SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">buffer=</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">&a[x]</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">, request=&r[x], </SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">MPI_COMM_WORLD</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">)</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffff00">>>></SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00"> </SPAN><FONT
color=#548dd4><SPAN style="BACKGROUND-COLOR: #ffff00">if</SPAN></FONT><SPAN
style="BACKGROUND-COLOR: #ffff00">(rc != MPI_SUCCESS)</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffff00">>>></SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00"> {</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffff00">>>></SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00">
</SPAN><SPAN style="BACKGROUND-COLOR: #ffff00">start_repair(i, repairing, q, a,
r, stream</SPAN><SPAN style="BACKGROUND-COLOR: #ffff00">); </SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffff00">>>></SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00"> }</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffffff">
}</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT
face=Arial></FONT> </DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffffff">
//</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffffff"> // Phase
</SPAN><SPAN style="BACKGROUND-COLOR: #ffffff">2</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">: </SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">finalize any </SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">unnecessary</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff"> ranks</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffffff">
//</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffffff"> </SPAN><FONT
color=#548dd4><SPAN style="BACKGROUND-COLOR: #ffffff">for</SPAN></FONT><SPAN
style="BACKGROUND-COLOR: #ffffff">(</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">int </SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">i</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff"> = </SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">active_workers + 1</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">;</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff"> </SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">i</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff"> < n</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">;</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff"> i++</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">)</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffffff">
{</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN
style="BACKGROUND-COLOR: #ffffff">
</SPAN><FONT color=#984806><SPAN
style="BACKGROUND-COLOR: #ffffff">MPI_</SPAN></FONT><FONT color=#984806><SPAN
style="BACKGROUND-COLOR: #ffffff">S</SPAN></FONT><FONT color=#984806><SPAN
style="BACKGROUND-COLOR: #ffffff">end</SPAN></FONT><SPAN
style="BACKGROUND-COLOR: #ffffff">(</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">dest</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">=</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">i</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">, </SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">&done_msg</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">, MPI_COMM_WORLD</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">)</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">;</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffffff">
}</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "> </DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "> </DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffffff">
//</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffffff"> //
</SPAN><SPAN style="BACKGROUND-COLOR: #ffffff">The progress engine. Get answers;
send new requests and handle</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffffff"> // process
repairs</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffffff">
//</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffffff"> </SPAN><FONT
color=#548dd4><SPAN style="BACKGROUND-COLOR: #ffffff">while</SPAN></FONT><SPAN
style="BACKGROUND-COLOR: #ffffff">(</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">active_workers != 0</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">)</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffffff">
{</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN
style="BACKGROUND-COLOR: #ffffff">
</SPAN><SPAN style="BACKGROUND-COLOR: #ffffff">rc = </SPAN><FONT
color=#984806><SPAN
style="BACKGROUND-COLOR: #ffffff">MPI_Waitany</SPAN></FONT><SPAN
style="BACKGROUND-COLOR: #ffffff">(</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">n, </SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">r</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">, &</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">i</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">, </SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">MPI_STATUS_IGNORE);</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "> </DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffff00">>>></SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00"> </SPAN><FONT
color=#548dd4><SPAN style="BACKGROUND-COLOR: #ffff00">if</SPAN></FONT><SPAN
style="BACKGROUND-COLOR: #ffff00">(</SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00">!</SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00">repairing[</SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00">i</SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00">])</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffff00">>>></SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00"> {</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffff00">>>></SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00"> </SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00">
</SPAN><FONT color=#548dd4><SPAN
style="BACKGROUND-COLOR: #ffff00">if</SPAN></FONT><SPAN
style="BACKGROUND-COLOR: #ffff00">(rc != MPI_SUCCESS)</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffff00">>>></SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00"> </SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00"> </SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00"> {</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffff00">>>></SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00"> </SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00"> </SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00"> </SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00">start_repair(i, repairing, q, a, r,
stream)</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffff00">>>></SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00">
continue;</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffff00">>>></SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00"> </SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00">
}</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "> </DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN
style="BACKGROUND-COLOR: #ffffff">
</SPAN><SPAN style="BACKGROUND-COLOR: #ffffff"> </SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">p</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">rocess_answer(&a[</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">i</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">]);</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffff00">>>></SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00"> }</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffff00">>>></SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00"> </SPAN><FONT
color=#548dd4><SPAN style="BACKGROUND-COLOR: #ffff00">else</SPAN></FONT><SPAN
style="BACKGROUND-COLOR: #ffff00"> </SPAN><FONT color=#548dd4><SPAN
style="BACKGROUND-COLOR: #ffff00">if</SPAN></FONT><SPAN
style="BACKGROUND-COLOR: #ffff00">(rc != MPI_SUCCESS)</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffff00">>>></SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00"> </SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00"> </SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00">{</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffff00">>>></SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00"> </SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00">
active_workers--;</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffff00">>>></SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00"> {</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "> </DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN
style="BACKGROUND-COLOR: #ffffff">
</SPAN><FONT color=#548dd4><SPAN
style="BACKGROUND-COLOR: #ffffff">if</SPAN></FONT><SPAN
style="BACKGROUND-COLOR: #ffffff">(</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">get_next_input(</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">stream, </SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">&q[</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">i</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">])</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff"> == eof)</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN
style="BACKGROUND-COLOR: #ffffff">
{</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN
style="BACKGROUND-COLOR: #ffffff">
active_workers--;</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffffff"> </SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">
</SPAN><FONT color=#984806><SPAN
style="BACKGROUND-COLOR: #ffffff">MPI_Send</SPAN></FONT><SPAN
style="BACKGROUND-COLOR: #ffffff">(</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">dest=</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">i</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">, &</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">done_msg</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">)</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN
style="BACKGROUND-COLOR: #ffffff">
{</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN
style="BACKGROUND-COLOR: #ffffff">
</SPAN><FONT color=#548dd4><SPAN
style="BACKGROUND-COLOR: #ffffff">else</SPAN></FONT></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN
style="BACKGROUND-COLOR: #ffffff">
{</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffffff"> </SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">
</SPAN><FONT color=#984806><SPAN
style="BACKGROUND-COLOR: #ffffff">MPI_Send</SPAN></FONT><SPAN
style="BACKGROUND-COLOR: #ffffff">(</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">dest=</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">i</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">, &q[</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">i</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">])</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffffff"> </SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">
</SPAN><SPAN style="BACKGROUND-COLOR: #ffffff">rc = </SPAN><FONT
color=#984806><SPAN
style="BACKGROUND-COLOR: #ffffff">MPI_Irecv</SPAN></FONT><SPAN
style="BACKGROUND-COLOR: #ffffff">(src</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">=</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">i</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">, </SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">buffer=</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">&a[</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">i</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">]</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">, request=&r[</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">i</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">], </SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">MPI_COMM_WORLD</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">)</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffff00">>>></SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00"> </SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00">
</SPAN><FONT color=#548dd4><SPAN
style="BACKGROUND-COLOR: #ffff00">if</SPAN></FONT><SPAN
style="BACKGROUND-COLOR: #ffff00">(rc != MPI_SUCCESS)</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffff00">>>></SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00"> </SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00"> </SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00"> {</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffff00">>>></SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00"> </SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00"> </SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00"> </SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00"> start_repair(i, repairing, q, a, r,
stream)</SPAN><SPAN style="BACKGROUND-COLOR: #ffff00">;</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffff00">>>></SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00"> </SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00">
}</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN
style="BACKGROUND-COLOR: #ffffff">
}</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffffff">
}</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "> </DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffffff"> </SPAN><FONT
color=#984806><SPAN
style="BACKGROUND-COLOR: #ffffff">MPI_Finalize</SPAN></FONT><SPAN
style="BACKGROUND-COLOR: #ffffff">()</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffffff">}</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "> </DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "> </DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffff00">>>></SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00"> </SPAN><FONT color=#548dd4><SPAN
style="BACKGROUND-COLOR: #ffff00">void</SPAN></FONT><SPAN
style="BACKGROUND-COLOR: #ffff00"> </SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00">start_</SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00">repair(int i, int repairing[], </SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00">Query q[], Answer q[], MPI_Request
r[]</SPAN><SPAN style="BACKGROUND-COLOR: #ffff00">, Stream stream</SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00">)</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffff00">>>></SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00"> </SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00">{</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffff00">>>></SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00"> </SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00"> repairing[i] =
true;</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffff00">>>></SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00"> </SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00"> push_</SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00">query_</SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00">back(strea</SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00">m, &q[i]);</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffff00">>>></SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00"> </SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00">
MPI_Comm_Irepair(MPI_COMM_WORLD, </SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00">i, &r[i]</SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00">);</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "><FONT face="Courier New, monospace"
size=2><SPAN style="BACKGROUND-COLOR: #ffff00">>>></SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00"> </SPAN><SPAN
style="BACKGROUND-COLOR: #ffff00">}</SPAN></FONT></DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "> </DIV>
<DIV style="BACKGROUND-COLOR: #d9d9d9; pt: "> </DIV>
<DIV> </DIV>
<DIV style="MARGIN-TOP: 10pt"><FONT face="Cambria, serif" color=#4f81bd><B>Logic
description (without FT)</B></FONT></DIV>
<DIV>The master code keeps track of the number of active workers through the
active_workers variable. It is solely used for the purpose of shutdown. When the
master is out of input, it shuts-down the workers by sending them
<I>‘</I><I>done</I><I>’</I> message. It decrease the number of active workers
and finalizes when this number reaches zero.</DIV>
<DIV> </DIV>
<DIV>The master’s progress engine waits on a vector of requests (note that entry
0 is not used, as to simplify the code); one it gets an answer it processes it
and sends the next query to that worker until it’s out of input.</DIV>
<DIV> </DIV>
<DIV style="MARGIN-TOP: 10pt"><FONT face="Cambria, serif" color=#4f81bd><B>Logic
description (wit</B><B>h</B><B> FT)</B></FONT></DIV>
<DIV>The master detects a faulty client either synchronously when it ties to
initiate an async receive (no need to check the send, the assumption is that if
send failed, so will the receive call), or async when the async receive
completes with an error. Once an error detected (and identified as a faulty
client, more about this later), the master starts an async repair of that
client. If the repair succeeds, new work is sent to that client. If it does not,
the number of active workers is decreased and the master has to live with less
processing power.</DIV>
<DIV> </DIV>
<DIV>The code above assumes that if the returned code is an error, it should
repair the worker; however as we discussed, there could very well be many
different reasons for an error here, which not all are related to process
failure; for that we might use something in lines of</DIV>
<DIV> </DIV>
<DIV style="TEXT-INDENT: 36pt; BACKGROUND-COLOR: #d9d9d9; pt: "><FONT
face="Courier New, monospace" color=#0070c0 size=2><SPAN
style="BACKGROUND-COLOR: #ffffff">i</SPAN><SPAN
style="BACKGROUND-COLOR: #ffffff">f</SPAN><FONT color=#000000><SPAN
style="BACKGROUND-COLOR: #ffffff">(</SPAN></FONT><FONT color=#984806><SPAN
style="BACKGROUND-COLOR: #ffffff">MPI_Error_event</SPAN></FONT><FONT
color=#000000><SPAN style="BACKGROUND-COLOR: #ffffff">(rc) ==
MPI_EVENT_PROCESS_DOWN)</SPAN></FONT><FONT color=#000000><SPAN
style="BACKGROUND-COLOR: #ffffff">...</SPAN></FONT></FONT></DIV>
<DIV> </DIV>
<DIV>it would be the responsibility of the MPI implementation to encode or store
the event related to the returned error code.</DIV>
<DIV><I>(Note: in MPICH2 there is </I><I>a mechanism that enables</I><I>
encoding extended error information </I><I>in the</I><I> error code</I><I>,
which then can be </I><I>retrieved</I><I> using
</I><I>MPI_Error_string)</I></DIV>
<DIV> </DIV>
<DIV style="MARGIN-TOP: 10pt"><FONT face="Cambria, serif" color=#4f81bd
size=3><B>Conclusions</B></FONT></DIV>
<DIV>I believe that the solution above describes what we have discussed in the
last meeting. The required API’s to support this FT are really minimal but
already cover a good set of users.</DIV>
<DIV> </DIV>
<DIV>Please, send your comments.</DIV>
<DIV>Thoughts?</DIV>
<DIV> </DIV>
<DIV>Thanks,</DIV>
<DIV>.Erez</DIV>
<DIV> </DIV>
<DIV>P.S. I will post this on the FT wiki pages (with the feedbac).</DIV>
<DIV>P.P.S. there is one more scenario that we discussed, and extension of the
master-workers model. I will try to get it write us as-soon-as-posible.</DIV>
<DIV> </DIV>
<DIV> </DIV>
<DIV> </DIV></FONT><pre>---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
</pre></BODY></HTML>