<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto"><div><font color="#000000"><span style="background-color: rgba(255, 255, 255, 0);"><a href="http://pubs.acs.org/doi/abs/10.1021/ct100439u" x-apple-data-detectors="true" x-apple-data-detectors-type="link" x-apple-data-detectors-result="0">http://pubs.acs.org/doi/abs/10.1021/ct100439u</a> is the paper I was<br>implicitly referencing.  They do RAID inside of GA. I can only do this sanely with MPI RMA </span></font><span style="background-color: rgba(255, 255, 255, 0);">(ie without resorting to nproc times as many windows as necessary) if and only iff I can continue to use data after process failure if I know it could not have been corrupted.</span></div><div><span style="background-color: rgba(255, 255, 255, 0);"><br></span></div><div><span style="background-color: rgba(255, 255, 255, 0);">It is possible that the paper doesn't adequately explain things for this context, in which case I will provide them later.</span></div><div><font color="#000000"><span style="background-color: rgba(255, 255, 255, 0);"><br></span></font></div><div><font color="#000000"><span style="background-color: rgba(255, 255, 255, 0);">Other stuff that may or may matter:</span></font></div><div><font color="#000000"><span style="background-color: rgba(255, 255, 255, 0);"><br><a href="http://hpc.pnl.gov/people/vishnu/public/vishnu_overdecomposition.pdf" x-apple-data-detectors="true" x-apple-data-detectors-type="link" x-apple-data-detectors-result="1">http://hpc.pnl.gov/people/vishnu/public/vishnu_overdecomposition.pdf</a><br><a href="http://hpc.pnl.gov/people/vishnu/public/vishnu_hipc10.pdf" x-apple-data-detectors="true" x-apple-data-detectors-type="link" x-apple-data-detectors-result="2">http://hpc.pnl.gov/people/vishnu/public/vishnu_hipc10.pdf</a><br><a href="http://dx.doi.org/10.1109/PDP.2011.72" x-apple-data-detectors="true" x-apple-data-detectors-type="link" x-apple-data-detectors-result="3">http://dx.doi.org/10.1109/PDP.2011.72</a><br><a href="http://link.springer.com/chapter/10.1007/978-3-642-23397-5_34" x-apple-data-detectors="true" x-apple-data-detectors-type="link" x-apple-data-detectors-result="4">http://link.springer.com/chapter/10.1007/978-3-642-23397-5_34</a><br><br>I assume someone from Argonne has presented GVR to the WG?</span></font></div><div><font color="#000000"><span style="background-color: rgba(255, 255, 255, 0);"><br></span></font></div><div><font color="#000000"><span style="background-color: rgba(255, 255, 255, 0);">Jeff</span></font><br><br>Sent from my iPhone</div><div><br>On Dec 10, 2014, at 10:12 PM, George Bosilca <<a href="mailto:bosilca@icl.utk.edu">bosilca@icl.utk.edu</a>> wrote:<br><br></div><blockquote type="cite"><div><div dir="ltr">Jeff,<div><br></div><div>I was trying to find some references to the GA FT work you mentioned during the plenary discussion today.</div><div><br></div><div>The only reference I could find about the FT capabilities of GA is [1] but it is getting dusty. A more recent reference [2] addresses NWCHEM in particular, but represents an application-specific user-level checkpoint/restart strategy, requiring minimal support from the communication library and that has little in common with the ongoing discussion in the WG.<br></div><div><br></div><div>I would really appreciate if you could provide a reference.</div><div><br></div><div>Thanks,</div><div>  George.</div><div><br><span style="font-size:8pt;font-family:NimbusRomNo9L">[1] V. Tipparaju, M. Krishnan, B. Palmer, F. Petrini, and J. Nieplocha,

“Towards fault resilient Global Arrays.” in </span><span style="font-size:8pt;font-family:NimbusRomNo9L;font-style:italic">International Conference

on Parallel Computing</span><span style="font-size:8pt;font-family:NimbusRomNo9L">, vol. 15, 2007, pp. 339–345. </span></div><div><span style="font-size:8pt;font-family:NimbusRomNo9L">[2] </span><font face="NimbusRomNo9L"><span style="font-size:10.6666669845581px">Nawab Ali, Sriram Krishnamoorthy, Niranjan Govind, Bruce Palmer, "</span></font><span style="font-size:10.6666669845581px;font-family:NimbusRomNo9L">A Redundant Communication Approach to Scalable Fault Tolerance in PGAS Programming Models", in PDP'11</span></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Dec 10, 2014 at 5:14 PM, Wesley Bland <span dir="ltr"><<a href="mailto:wbland@anl.gov" target="_blank">wbland@anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">I've posted notes from today's plenary session on the wiki page:<div><br></div><div><a href="https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/ftwg2014-12-10" target="_blank">https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/ftwg2014-12-10</a><br></div><div><br></div><div>I'm also attaching the slides to this email and I believe they'll be posted on the forum website by Martin at some point.</div><div><br></div><div>Thanks,</div><div>Wesley</div></div>

<br>_______________________________________________<br>

mpiwg-ft mailing list<br>

<a href="mailto:mpiwg-ft@lists.mpi-forum.org">mpiwg-ft@lists.mpi-forum.org</a><br>

<a href="http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-ft" target="_blank">http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-ft</a><br></blockquote></div><br></div>

</div></blockquote></body></html>