<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML xmlns="http://www.w3.org/TR/REC-html40" xmlns:v =
"urn:schemas-microsoft-com:vml" xmlns:o =
"urn:schemas-microsoft-com:office:office" xmlns:w =
"urn:schemas-microsoft-com:office:word" xmlns:x =
"urn:schemas-microsoft-com:office:excel" xmlns:p =
"urn:schemas-microsoft-com:office:powerpoint" xmlns:a =
"urn:schemas-microsoft-com:office:access" xmlns:dt =
"uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns:s =
"uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882" xmlns:rs =
"urn:schemas-microsoft-com:rowset" xmlns:z = "#RowsetSchema" xmlns:b =
"urn:schemas-microsoft-com:office:publisher" xmlns:ss =
"urn:schemas-microsoft-com:office:spreadsheet" xmlns:c =
"urn:schemas-microsoft-com:office:component:spreadsheet" xmlns:odc =
"urn:schemas-microsoft-com:office:odc" xmlns:oa =
"urn:schemas-microsoft-com:office:activation" xmlns:html =
"http://www.w3.org/TR/REC-html40" xmlns:q =
"http://schemas.xmlsoap.org/soap/envelope/" XMLNS:D = "DAV:" xmlns:x2 =
"http://schemas.microsoft.com/office/excel/2003/xml" xmlns:ois =
"http://schemas.microsoft.com/sharepoint/soap/ois/" xmlns:dir =
"http://schemas.microsoft.com/sharepoint/soap/directory/" xmlns:ds =
"http://www.w3.org/2000/09/xmldsig#" xmlns:dsp =
"http://schemas.microsoft.com/sharepoint/dsp" xmlns:udc =
"http://schemas.microsoft.com/data/udc" xmlns:xsd =
"http://www.w3.org/2001/XMLSchema" xmlns:sub =
"http://schemas.microsoft.com/sharepoint/soap/2002/1/alerts/" xmlns:ec =
"http://www.w3.org/2001/04/xmlenc#" xmlns:sp =
"http://schemas.microsoft.com/sharepoint/" xmlns:sps =
"http://schemas.microsoft.com/sharepoint/soap/" xmlns:xsi =
"http://www.w3.org/2001/XMLSchema-instance" xmlns:udcxf =
"http://schemas.microsoft.com/data/udc/xmlfile" xmlns:wf =
"http://schemas.microsoft.com/sharepoint/soap/workflow/" xmlns:mver =
"http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:m =
"http://schemas.microsoft.com/office/2004/12/omml" xmlns:mrels =
"http://schemas.openxmlformats.org/package/2006/relationships" xmlns:ex12t =
"http://schemas.microsoft.com/exchange/services/2006/types" xmlns:ex12m =
"http://schemas.microsoft.com/exchange/services/2006/messages" XMLNS:Z =
"urn:schemas-microsoft-com:" xmlns:st = ""><HEAD><TITLE>Summary of today's meeting</TITLE>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.2900.3395" name=GENERATOR>
<STYLE>@font-face {
font-family: Wingdings;
}
@font-face {
font-family: Cambria Math;
}
@font-face {
font-family: Calibri;
}
@font-face {
font-family: Tahoma;
}
@page Section1 {size: 8.5in 11.0in; margin: 1.0in 1.0in 1.0in 1.0in; }
P.MsoNormal {
FONT-SIZE: 12pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: "Times New Roman","serif"
}
LI.MsoNormal {
FONT-SIZE: 12pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: "Times New Roman","serif"
}
DIV.MsoNormal {
FONT-SIZE: 12pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: "Times New Roman","serif"
}
A:link {
COLOR: blue; TEXT-DECORATION: underline; mso-style-priority: 99
}
SPAN.MsoHyperlink {
COLOR: blue; TEXT-DECORATION: underline; mso-style-priority: 99
}
A:visited {
COLOR: purple; TEXT-DECORATION: underline; mso-style-priority: 99
}
SPAN.MsoHyperlinkFollowed {
COLOR: purple; TEXT-DECORATION: underline; mso-style-priority: 99
}
SPAN.EmailStyle17 {
COLOR: #1f497d; FONT-FAMILY: "Calibri","sans-serif"; mso-style-type: personal-reply
}
.MsoChpDefault {
FONT-SIZE: 10pt; mso-style-type: export-only
}
DIV.Section1 {
page: Section1
}
OL {
MARGIN-BOTTOM: 0in
}
UL {
MARGIN-BOTTOM: 0in
}
</STYLE>
<!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></HEAD>
<BODY lang=EN-US vLink=purple link=blue>
<DIV dir=ltr align=left><SPAN class=361383319-23102008><FONT face=Arial><FONT
size=2><FONT color=#0000ff>Some more notes from our discussion on the topic of
MPI standard support for<FONT face=Calibri> "</FONT></FONT><SPAN
style="FONT-SIZE: 11pt; FONT-FAMILY: 'Calibri','sans-serif'"><FONT
color=#0000ff>checkpoint/restart<SPAN class=361383319-23102008>":
</SPAN></FONT></SPAN></FONT></FONT></SPAN></DIV>
<UL dir=ltr>
<LI>
<DIV align=left><SPAN class=361383319-23102008><SPAN
style="FONT-SIZE: 11pt; FONT-FAMILY: 'Calibri','sans-serif'"><FONT face=Arial
color=#0000ff size=2><SPAN class=361383319-23102008>We grouped C/R under two
categories: application-directed and system-level.
</SPAN></FONT></SPAN></SPAN></DIV></LI>
<LI>
<DIV align=left><SPAN class=361383319-23102008><SPAN
style="FONT-SIZE: 11pt; FONT-FAMILY: 'Calibri','sans-serif'"><FONT face=Arial
color=#0000ff size=2><SPAN class=361383319-23102008>System-level C/R can be
accomplished via many techniques: intercepting every level of system stack,
using virtualization techniques, etc.</SPAN></FONT></SPAN></SPAN></DIV></LI>
<LI>
<DIV align=left><SPAN class=361383319-23102008><SPAN
style="FONT-SIZE: 11pt; FONT-FAMILY: 'Calibri','sans-serif'"><FONT face=Arial
color=#0000ff size=2><SPAN class=361383319-23102008>Application-directed C/R
will still require some quiescence hooks from the MPI layer (ex: asyncronous
progression by the MPI layer). There was some discussion on
this.</SPAN></FONT></SPAN></SPAN></DIV></LI>
<LI>
<DIV align=left><SPAN class=361383319-23102008><SPAN
style="FONT-SIZE: 11pt; FONT-FAMILY: 'Calibri','sans-serif'"><FONT face=Arial
color=#0000ff size=2><SPAN class=361383319-23102008>The MPI requirements for
System-level checkpointing cannot be formulated until we get more data to
define a "quiet state"</SPAN></FONT></SPAN></SPAN></DIV></LI></UL>
<DIV dir=ltr align=left><SPAN class=361383319-23102008><SPAN
style="FONT-SIZE: 11pt; FONT-FAMILY: 'Calibri','sans-serif'"><FONT face=Arial
color=#0000ff size=2><SPAN
class=361383319-23102008></SPAN></FONT></SPAN></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=361383319-23102008><SPAN
style="FONT-SIZE: 11pt; FONT-FAMILY: 'Calibri','sans-serif'"><FONT face=Arial
color=#0000ff size=2><SPAN class=361383319-23102008>I queried Mike Hefner on the
sematics of freeze/unfreeze in their (Evergrid/Librato) transparent C/R
approach, and here is his response :</SPAN></FONT></SPAN></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=361383319-23102008><SPAN
style="FONT-SIZE: 11pt; FONT-FAMILY: 'Calibri','sans-serif'"><FONT face=Arial
color=#0000ff size=2><SPAN
class=361383319-23102008></SPAN></FONT></SPAN></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=361383319-23102008><SPAN
style="FONT-SIZE: 11pt; FONT-FAMILY: 'Calibri','sans-serif'"><FONT
color=#0000ff><SPAN class=361383319-23102008><FONT face=Arial size=2>Question 1:
What is ur definition of a quiet state (after the freeze call)? Do U expect the
MPI to unpin memory? free resources? or just quiet the message traffic? We need
to explicitly state the semantics here
...</FONT></SPAN></FONT></SPAN></SPAN><SPAN class=361383319-23102008><SPAN
style="FONT-SIZE: 11pt; FONT-FAMILY: 'Calibri','sans-serif'"><SPAN
class=361383319-23102008></DIV>
<DIV dir=ltr align=left>
<P><FONT face=Arial color=#800080 size=2>We defined it as a state that will
provide a consistent state of the application across all processes. From the MPI
standpoint, this would mean a state whereby all processes in the "freeze" state
would be able to continue communication if a restart were invoked.</FONT></P>
<P><FONT face=Arial color=#800080 size=2>In terms of particular resources, our
CP/R software manages storing all application and, optionally, all MPI memory.
This includes memory that has been allocated by either a malloc(3) call or a
mmap/mremap call. If that memory has been pinned by the IB driver, we will store
it to disk as well. We also store the primary process resources in use: IPCs,
shared memory, file handles and file rollback state, etc.</FONT></P>
<P><FONT face=Arial color=#0000ff size=2><FONT color=#800080>These memory
regions and other resources are recorded after each process returns from the
freeze API</FONT>.</FONT></P>
<P><FONT face=Arial><FONT size=2><FONT color=#0000ff><SPAN
class=361383319-23102008>Question </SPAN>2<SPAN class=361383319-23102008>:
</SPAN>The same goes for restore. What is expected to be there, and what
</FONT><FONT color=#0000ff>is expected to be supplied as the
context....</FONT></FONT></FONT></P>
<P><FONT face=Arial color=#800080 size=2>On a restore all of the memory (and
other resources such as IPCs, open files, etc.) will be recreated and reloaded
with the state that was recorded at checkpoint time *before* the restart API is
called. On the restart, it is expected that the MPI stack reinitialize the
interconnect card, recreate necessary handles for fabric communication, and
re-pin all previously pinned memory regions in use by the fabric's
card.</FONT></P>
<P><SPAN class=361383319-23102008><FONT face=Arial color=#0000ff
size=2>-Kannan-</FONT></SPAN></P></SPAN></SPAN></SPAN></DIV>
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> mpi3-ft-bounces@lists.mpi-forum.org
[mailto:mpi3-ft-bounces@lists.mpi-forum.org] <B>On Behalf Of </B>Erez
Haba<BR><B>Sent:</B> Wednesday, October 22, 2008 8:53 PM<BR><B>To:</B> MPI 3.0
Fault Tolerance and Dynamic Process Control working Group<BR><B>Subject:</B> Re:
[Mpi3-ft] Summary of today's meeting<BR></FONT><BR></DIV>
<DIV></DIV>
<DIV class=Section1>
<P class=MsoNormal><SPAN
style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">Thanks
for capturing this.<o:p></o:p></SPAN></P>
<P class=MsoNormal><SPAN
style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'"><o:p> </o:p></SPAN></P>
<P class=MsoNormal><SPAN
style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">My
comments inline…<o:p></o:p></SPAN></P>
<P class=MsoNormal><SPAN
style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'"><o:p> </o:p></SPAN></P>
<DIV>
<DIV
style="BORDER-RIGHT: medium none; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; PADDING-LEFT: 0in; PADDING-BOTTOM: 0in; BORDER-LEFT: medium none; PADDING-TOP: 3pt; BORDER-BOTTOM: medium none">
<P class=MsoNormal><B><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: 'Tahoma','sans-serif'">From:</SPAN></B><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: 'Tahoma','sans-serif'">
mpi3-ft-bounces@lists.mpi-forum.org [mailto:mpi3-ft-bounces@lists.mpi-forum.org]
<B>On Behalf Of </B>Richard Graham<BR><B>Sent:</B> Tuesday, October 21, 2008
9:03 PM<BR><B>To:</B> MPI 3.0 Fault Tolerance and Dynamic Process Control
working Group<BR><B>Subject:</B> [Mpi3-ft] Summary of today's
meeting<o:p></o:p></SPAN></P></DIV></DIV>
<P class=MsoNormal><o:p> </o:p></P>
<P class=MsoNormal style="MARGIN-BOTTOM: 12pt"><SPAN
style="FONT-SIZE: 11pt; FONT-FAMILY: 'Calibri','sans-serif'">Here is a summary
of what I think that we agreed to today. Please correct any errors, and
add what I am missing.</SPAN><o:p></o:p></P>
<UL type=disc>
<LI class=MsoNormal
style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l0 level1 lfo1"><SPAN
style="FONT-SIZE: 11pt; FONT-FAMILY: 'Calibri','sans-serif'">We need to be
able to restore MPI_COMM_WORLD (and it’s derivatives) to a usable state when a
process fails. </SPAN><o:p></o:p></LI></UL>
<P class=MsoNormal
style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto"><B><I><SPAN
style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">[erezh]
I think that we discussed this with reference to the comment that MPI is not
usable once it returned an error. we need to address that in the current
standard. (I think that this should be the first item on the
list)<o:p></o:p></SPAN></I></B></P>
<P class=MsoNormal
style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto"><B><I><SPAN
style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">[erezh]
as I recall the second item on the list, is returning errors per call site (per
the Error Reporting Rules proposal)<o:p></o:p></SPAN></I></B></P>
<P class=MsoNormal
style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto"><B><I><SPAN
style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">[erez]
as for this specific items, I think that the wording should be “repair” rather
than restore (when repair is either making a “hole” in the communicator or
“filling” the whole with a new process.</SPAN></I></B><SPAN
style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'"><o:p></o:p></SPAN></P>
<UL type=disc>
<LI class=MsoNormal
style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l0 level1 lfo1"><SPAN
style="FONT-SIZE: 11pt; FONT-FAMILY: 'Calibri','sans-serif'">Restoration may
involve having MPI_PROC_NULL replace the lost process, or may replaced the
lost processes with a new process (have not specified how this would happen)
</SPAN><o:p></o:p></LI></UL>
<P class=MsoNormal
style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto"><B><I><SPAN
style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">[erezh]
again I would replace “restoration” with “repair”<o:p></o:p></SPAN></I></B></P>
<P class=MsoNormal
style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto"><B><I><SPAN
style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">[erezh]
We said that we can use MPI_PROC_NULL for making a “hole”. i.e., the
communicator will not be in the error state anymore (thus you can receive from
MPI_ANY_SOURCE or use a collective) however any direct communication with the
“hole” rank is like using MPI_PROC_NULL.<o:p></o:p></SPAN></I></B></P>
<P class=MsoNormal
style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto"><B><I><SPAN
style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">[erezh]
We also said that replacing the lost process with a new one only applies to
MPI_COMM_WORD.<o:p></o:p></SPAN></I></B></P>
<UL type=disc>
<LI class=MsoNormal
style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l0 level1 lfo1"><SPAN
style="FONT-SIZE: 11pt; FONT-FAMILY: 'Calibri','sans-serif'">Processes
communicating directly with the failed processes will be notified via a
returned error code about the failure. </SPAN><o:p></o:p>
<LI class=MsoNormal
style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l0 level1 lfo1"><SPAN
style="FONT-SIZE: 11pt; FONT-FAMILY: 'Calibri','sans-serif'">When a process is
notified of the failure, comm_repair() must be called. Comm_repair() is
not a collective call, and is what will initiate the communicator repair
associated with the failed process. </SPAN><o:p></o:p></LI></UL>
<P class=MsoNormal
style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto"><B><I><SPAN
style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">[erezh]
we also discussed “generation” or “revision” of a process rank to identify if a
process was recycled. I think that we ended up saying that it’s not really
required and it’s the application responsibility to identify a restored process
where there might be a dependency on prev communication (with other
ranks)</SPAN></I></B><SPAN
style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'"><o:p></o:p></SPAN></P>
<UL type=disc>
<LI class=MsoNormal
style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l0 level1 lfo1"><SPAN
style="FONT-SIZE: 11pt; FONT-FAMILY: 'Calibri','sans-serif'">If a process
wants to be notified of process failure even if it is not communicating
directly with this process, it must register for this notification.
</SPAN><o:p></o:p>
<LI class=MsoNormal
style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l0 level1 lfo1"><SPAN
style="FONT-SIZE: 11pt; FONT-FAMILY: 'Calibri','sans-serif'">We don’t have
enough information to know how to continue with support for
checkpoint/restart. </SPAN><o:p></o:p></LI></UL>
<P class=MsoNormal
style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto"><B><I><SPAN
style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">[erezh]
we discussed system level checkpoint/restart versus application aware checkpoint
restart</SPAN></I></B><SPAN
style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'"><o:p></o:p></SPAN></P>
<UL type=disc>
<LI class=MsoNormal
style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l0 level1 lfo1"><SPAN
style="FONT-SIZE: 11pt; FONT-FAMILY: 'Calibri','sans-serif'">We need to
discuss what needs to do with respect to failure of collective
communications.</SPAN><o:p></o:p> </LI></UL>
<P class=MsoNormal
style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto"><B><I><SPAN
style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">[erezh]
we raised the issue of identifying asymmetric view of the communicator after a
“hole” repair and its impact on collectives (e.g., the link between ranks 2 and
3 is broken but they can both comm. With rank 1) . Furthermore we explored some
solution by adding information to the collective message(s) to identify that the
communicator view is consistent. (we said that it requires further
exploration)</SPAN></I></B><SPAN
style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'"><o:p></o:p></SPAN></P>
<P class=MsoNormal><SPAN
style="FONT-SIZE: 11pt; FONT-FAMILY: 'Calibri','sans-serif'"><BR>There are
several issues that came up with respect to these, which will be detailed later
on.<BR><BR>Rich</SPAN><o:p></o:p></P></DIV></BODY></HTML>