[mpiwg-ft] Notes from FTWG Plenary Session

Jeff Hammond jeff.science at gmail.com
Thu Dec 11 00:26:10 CST 2014


http://pubs.acs.org/doi/abs/10.1021/ct100439u is the paper I was
implicitly referencing.  They do RAID inside of GA. I can only do this sanely with MPI RMA (ie without resorting to nproc times as many windows as necessary) if and only iff I can continue to use data after process failure if I know it could not have been corrupted.

It is possible that the paper doesn't adequately explain things for this context, in which case I will provide them later.

Other stuff that may or may matter:

http://hpc.pnl.gov/people/vishnu/public/vishnu_overdecomposition.pdf
http://hpc.pnl.gov/people/vishnu/public/vishnu_hipc10.pdf
http://dx.doi.org/10.1109/PDP.2011.72
http://link.springer.com/chapter/10.1007/978-3-642-23397-5_34

I assume someone from Argonne has presented GVR to the WG?

Jeff

Sent from my iPhone

> On Dec 10, 2014, at 10:12 PM, George Bosilca <bosilca at icl.utk.edu> wrote:
> 
> Jeff,
> 
> I was trying to find some references to the GA FT work you mentioned during the plenary discussion today.
> 
> The only reference I could find about the FT capabilities of GA is [1] but it is getting dusty. A more recent reference [2] addresses NWCHEM in particular, but represents an application-specific user-level checkpoint/restart strategy, requiring minimal support from the communication library and that has little in common with the ongoing discussion in the WG.
> 
> I would really appreciate if you could provide a reference.
> 
> Thanks,
>   George.
> 
> [1] V. Tipparaju, M. Krishnan, B. Palmer, F. Petrini, and J. Nieplocha, “Towards fault resilient Global Arrays.” in International Conference on Parallel Computing, vol. 15, 2007, pp. 339–345. 
> [2] Nawab Ali, Sriram Krishnamoorthy, Niranjan Govind, Bruce Palmer, "A Redundant Communication Approach to Scalable Fault Tolerance in PGAS Programming Models", in PDP'11
> 
>> On Wed, Dec 10, 2014 at 5:14 PM, Wesley Bland <wbland at anl.gov> wrote:
>> I've posted notes from today's plenary session on the wiki page:
>> 
>> https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/ftwg2014-12-10
>> 
>> I'm also attaching the slides to this email and I believe they'll be posted on the forum website by Martin at some point.
>> 
>> Thanks,
>> Wesley
>> 
>> _______________________________________________
>> mpiwg-ft mailing list
>> mpiwg-ft at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-ft
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20141210/e5e69d96/attachment-0001.html>


More information about the mpiwg-ft mailing list