<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div><br></div><div>Hi all!</div><div><br></div><div>Wesley, thanks for the update.</div><div><br></div><div>Martin and Todd, there has been some work in this area of checkpoint/restart where the MPI runtime stays alive. I would like to point out Wesley’s own work (<a href="http://www.netlib.org/utk/people/JackDongarra/PAPERS/CoF-europar2012.pdf">http://www.netlib.org/utk/people/JackDongarra/PAPERS/CoF-europar2012.pdf</a>), as well as, Frank Mueller’s work (<a href="http://www.christian-engelmann.info/publications/wang07job.pdf">http://www.christian-engelmann.info/publications/wang07job.pdf</a>). Also, Matttan Erez is looking at a similar approach for his containment domain work.</div><div><br></div><div>The original idea of the MPI Fault Tolerance Working group was to develop a proposal that allows for a multitude of solutions. I see this new “proposal” as an extension of the exiting proposal that uses a subset of its features, requiring additional system-level checkpoint/restart features (e.g. long jump and MPI state roll-back) be part of the MPI standard.</div><div><br></div><div>Thanks,</div><div>Christian</div><div><br></div><div apple-content-edited="true"><div style="color: rgb(0, 0, 0); font-family: Helvetica; font-size: medium; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">--</div><div style="color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><br></div><div style="color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Christian Engelmann, Ph.D.<br><br>System Software Team Task Lead / R&D Staff Scientist<br>Computer Science Research Group<br>Computer Science and Mathematics Division<br>Oak Ridge National Laboratory<br><br>Mail: P.O. Box 2008, Oak Ridge, TN 37831-6173, USA<br>Phone: +1 (865) 574-3132 / Fax: +1 (865) 576-5491</div><div style="color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">e-Mail: <a href="mailto:engelmannc@ornl.gov">engelmannc@ornl.gov</a> / Home: <a href="http://www.christian-engelmann.info">www.christian-engelmann.info</a><br></div>
</div>
<br></body></html>