[mpiwg-ft] FTWG Call Today

Van Der Wijngaart, Rob F rob.f.van.der.wijngaart at intel.com
Fri Apr 28 01:31:04 CDT 2017


Hello All,

I wanted to make sure I/we are not missing important alternate MPI fault tolerance approaches. Keita had earlier pointed me to the Fault Aware MPI effort by Tony Skjellum et al. (I am not sure if his team members are on this mail alias), and I started reading their 2015 paper "Practical Resilient Cases for FA-MPI, A Transactional Fault-Tolerant MPI" again. It's an interesting approach, although I am, of course, very concerned about regular collective calls to corral faults, and also about the need to constrain applications to using non-blocking communications only. Is there any information/insight into how seriously this approach is being pursued for inclusion in the MPI standard? Thanks.

Rob

-----Original Message-----
From: mpiwg-ft-bounces at lists.mpi-forum.org [mailto:mpiwg-ft-bounces at lists.mpi-forum.org] On Behalf Of Aurelien Bouteiller
Sent: Wednesday, April 26, 2017 12:26 PM
To: MPI WG Fault Tolerance and Dynamic Process Control working Group <mpiwg-ft at lists.mpi-forum.org>
Cc: ilaguna at llnl.gov
Subject: Re: [mpiwg-ft] FTWG Call Today

Ignacio, could you circulate the slides I can’t join the Skype.

Aurelien

> On Apr 26, 2017, at 14:16, Bland, Wesley <wesley.bland at intel.com> wrote:
> 
> That's OK, we are not only fault tolerant, but also latency tolerant.
> 
> -----Original Message-----
> From: mpiwg-ft-bounces at lists.mpi-forum.org [mailto:mpiwg-ft-bounces at lists.mpi-forum.org] On Behalf Of Ignacio Laguna
> Sent: Wednesday, April 26, 2017 11:15 AM
> To: MPI WG Fault Tolerance and Dynamic Process Control working Group <mpiwg-ft at lists.mpi-forum.org>; Bland, Wesley <wesley.bland at intel.com>
> Subject: Re: [mpiwg-ft] FTWG Call Today
> 
> Yes, I should be able to do that. I may be around 5 minutes late though.
> 
> --
> Ignacio Laguna
> Center for Applied Scientific Computing (CASC) Lawrence Livermore National Laboratory
> Phone: 925-422-7308, Fax: 925-422-6287
> 
> On 4/26/17 9:27 AM, Bland, Wesley wrote:
>> Hi FTWG,
>> 
>> If I'm not mistaken, we have an FTWG con call today. On the agenda is continuing the discussion of the concerns around ULFM. Ignacio, are you ready to present the Reinit slides you had a month or two ago while I was out?
>> 
>> Here's the call-in information:
>> 
>> .........................................................................................................................................
>> Join online meeting
>> https://meet.intel.com/wesley.bland/WBF3C1SD
>> 
>> Join by Phone
>> +1 ((916) 356-2663 (or your local bridge access #) Choose bridge 5.
>> 
>> Conference ID: 494710807
>> .........................................................................................................................................
>> 
>> Thanks,
>> Wesley
>> _______________________________________________
>> mpiwg-ft mailing list
>> mpiwg-ft at lists.mpi-forum.org
>> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-ft
> 
> _______________________________________________
> mpiwg-ft mailing list
> mpiwg-ft at lists.mpi-forum.org
> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-ft
> _______________________________________________
> mpiwg-ft mailing list
> mpiwg-ft at lists.mpi-forum.org
> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-ft

_______________________________________________
mpiwg-ft mailing list
mpiwg-ft at lists.mpi-forum.org
https://lists.mpi-forum.org/mailman/listinfo/mpiwg-ft


More information about the mpiwg-ft mailing list