[mpiwg-ft] FTWG Call Today

Van Der Wijngaart, Rob F rob.f.van.der.wijngaart at intel.com
Thu May 4 18:42:13 CDT 2017


Hi Tony,

Thanks for the response. It will take me a bit of time to take in all the points you made in this magnum opus and respond appropriately. But I really appreciate your thoughtful formulation. I think you’re saying that premature standardization should be avoided, and I agree. On the other hand, once a certain proposal is expressive enough that it can fully serve its purpose AND does not create an inevitable obstacle to efficiency, there is no danger to adopting it, even if alternative formulations exist, especially if they are functionally equivalent. I do hope we can make progress soon.
Stay tuned for more.

Rob

From: mpiwg-ft-bounces at lists.mpi-forum.org [mailto:mpiwg-ft-bounces at lists.mpi-forum.org] On Behalf Of Anthony Skjellum
Sent: Thursday, May 04, 2017 3:26 PM
To: MPI WG Fault Tolerance and Dynamic Process Control working Group <mpiwg-ft at lists.mpi-forum.org>
Cc: Purushotham V. Bangalore <puri at uab.edu>; ilaguna at llnl.gov; Dan Holmes <d.holmes at epcc.ed.ac.uk>; Amin Hassani <ahassani4 at gmail.com>
Subject: Re: [mpiwg-ft] FTWG Call Today

Rob, sorry for slow reply.  It is our intention to propose the set of  concepts needed to realize FA-MPI to the forum:  as a complete motivating proposal, but also in terms of verbs that enable us and, in this latter mode, also enable others like ULFM, without insisting on FA-MPI itself as the outcome.  Now, the main idea, the groupwise, non-blocking transactional block API is probably something we can't live without, but that's TBD how to standardize. Exploring how to get FA-MPI *functionality* vs. *exact API* could be a new, orthogonal, complementary, cooperative activity of the working group in addition to the drive for ULFM as a monolithic set of APIs.

We started in 2011-2012 with FA-MPI because I didn't think the presentation that our colleague Rich Graham was giving in the forum in Chicago at one meeting on the then-version of UFLM (or predecesssors) was the only way to go,. You've seen the 2015 work, and we have active development on-going.  One of our students (Hassani, cc'd above) finished in 2016 at UAB, and his PhD was the first implementation. Plus compact apps (LULESH and MiniFe).  It works well.  It doesn't have major issues and is going to be evolved further.  It still requires that there be meaningful fault models and ability to recover from true network  inconsistent states, so the work continues.  We aren't satisfied with just the API consistency, and there are hard, open problems when it comes to inconsistent network state.

The "issue" for us per se (perhaps that is a loaded word, OK, take it politely as intended) is that ULFM is offered as an omnibus package; our approach is also a complete solution.  They are quite different.  Both could be adopted. The ULFM colleagues run this working group, so ULFM gets the appropriate discussion :-)  They have the effort and sweat equity.  We only have nice things to say about them :-) they've been working for many years on this.

My students if not me are on all the FTWG calls.  But our strategy is to build up confidence and experience and working use cases and small codes and look for more and more best practice before bringing the ideas up again at the forum for more formal consideration.  Puri Bangalore and I would be happy to brief the group at the September meeting on where we are, to get us back in the fold :-)  Puri is cc'd above.

Please remember that our goal is to support next-generation exascale apps, and we did not insist on 100% API support.  Our expectation (this work is being done by another student) is to transform
legacy MPI apps for fault tolerance using source-to-source translation using the Rose compiler framework, which is robust enough for large-scale applications.  That transformation approach will continue in parallel with our FA-MPI efforts.  We don't think that 100% backward compatibility is a good early constraint on research; legacy impacts of some features of MPI, like blocking, impact the investigation... with early efforts to standardize you more or less have to be that way; that's the ULFM and predecessor path IMHO; again, with respect to all the hard work of George, Aurelien, Wesley and others in their team.

This week, George and I have  just started a new side discussion about how we might try to define common runtime infrastructure.  We will plan to enhancement that discussion this summer.  But it is different from merging proposals at the medium or higher levels.    I think this discussion of common underlying low-level infrastructure could lead to further choices of standardization too...

Puri andI are also working to show where atomic suboperations could be offered instead of ULFM and FA-MPI, which could be common standardized APIs useful for both; non-blocking, with features like timeouts... but this is by no means uncontroversial.  We plan to put in a paper on this or poster for EuroMPI.  We had an earlier paper showing how to look at FA-MPI-, ULFM (as then proposed), and G-ULFM, our idea for non-blocking, lower-level building blocks that "look like ULFM but are a bit generalized," some years ago.    That did not get traction (yet).

It is apparent to me that we should standardize the minimal, low-level primitives upon which UFLM, FA-MPI, Reinit, and others can be layered without "much if any" loss of performance, fidelity, or functionality. For instance, if you need a consensus on a group, OK, you get it.  But you start from a nonblocking operation.  But you also are unwilling to wait forever, so you have primitive features like timeout.  And if you can avoid extra concensuses (is that a word?), you make that as justifieid as an atomic.

The "reduced API" vs. the "complex API" approach to fault-aware primitives is what I am getting at.   But, we want semantic features like timeout that others argue so far are not needed.
We want to promote that the interaction is an application concern (which might be woven in as an aspect that is differently coded on different HPC architectures or even at different scales on the same architecture) that communicates to/from MPI about errors, and that the MPI transaction blocks support progress and error propagation.  All of which we have a prototype for.  But, we don't know how to do blocking things well like MPI_Send() and MPI_Recv(); MPI_Isend() and MPI_Irecv() are OK. We also have a difference about fault-free overhead. Our view is that there is fault-free overhead inherent in such middleware and that we need to trade off the potential for lost work vs. the cost of fault awareness; others think this may be negligible.  Further study is warranted.

This is in no way inviting a flame war on this list.   Our goal is to find, with the whole group, the set of primitives we can all agree on, that let each model operate without significant penalty,
and we ask the MPI Forum to approve those.  ULFM and FA-MPI and others we will then work to layer to those "verbs" or "reduced API" through explorations, design, and informal  negotiations I hope we could do in 2017-2018.  When  each group says "this set of verbs allows me to build my model acceptably well", then we can just support them, and offer our added-value libraries to support specific application preferences.  Ours would work with a subset of MPI (the nonblocking and persistent nonblocking operations most likely to be emphasized at exascale), the ULFM with the entire MPI, etc.  But, the fundamental verbs project has to be done.  The leverage of such an approach is the openness to allow others to do differently than either high-level monolithic API set would do per se, and maybe even allow more than one kind of fault tolerance in a big app, or an app that uses 3rd party libraries.   [ They ight also do so far into the future.  Perhaps such APIs belong in the MPI_T section of the standard, because they are not for "end users" but rather "fault tolerant programmers."]

Our separate exploration of Sessions and QoS for communicators is complementary to the work thus far on FA-MPI, and is another path to getting resoure visibility needed to recover after a fault. We have high hopes for Session-based fault tolerant low-level standardization that enables both FA-MPI and ULFM, but we are not far enough along there yet.  That is another WG.  Dan Holmes, Puri, and I are exploring Sessions for FT and soft/firm RT/QoS; it is early days.

I'd be happy to discuss with you offline about FA-MPI on-going R&D; if you have use cases, application requirements, agile-type stories, or ideas to help us mature our work, great.

If we get invited to talk in the September WG at Chicago, we will certainly update everyone then.  I can promise more engagement if there is space for our effort to work in concert with our colleagues pursuing ULFM.  Unfortunately, neither Puri or I get to go to the June meeting ;-(.

Again, sorry for the slow response.

Respectfully,
Tony Skjellum




On Fri, Apr 28, 2017 at 1:31 AM, Van Der Wijngaart, Rob F <rob.f.van.der.wijngaart at intel.com<mailto:rob.f.van.der.wijngaart at intel.com>> wrote:
Hello All,

I wanted to make sure I/we are not missing important alternate MPI fault tolerance approaches. Keita had earlier pointed me to the Fault Aware MPI effort by Tony Skjellum et al. (I am not sure if his team members are on this mail alias), and I started reading their 2015 paper "Practical Resilient Cases for FA-MPI, A Transactional Fault-Tolerant MPI" again. It's an interesting approach, although I am, of course, very concerned about regular collective calls to corral faults, and also about the need to constrain applications to using non-blocking communications only. Is there any information/insight into how seriously this approach is being pursued for inclusion in the MPI standard? Thanks.

Rob

-----Original Message-----
From: mpiwg-ft-bounces at lists.mpi-forum.org<mailto:mpiwg-ft-bounces at lists.mpi-forum.org> [mailto:mpiwg-ft-bounces at lists.mpi-forum.org<mailto:mpiwg-ft-bounces at lists.mpi-forum.org>] On Behalf Of Aurelien Bouteiller
Sent: Wednesday, April 26, 2017 12:26 PM
To: MPI WG Fault Tolerance and Dynamic Process Control working Group <mpiwg-ft at lists.mpi-forum.org<mailto:mpiwg-ft at lists.mpi-forum.org>>
Cc: ilaguna at llnl.gov<mailto:ilaguna at llnl.gov>
Subject: Re: [mpiwg-ft] FTWG Call Today

Ignacio, could you circulate the slides I can’t join the Skype.

Aurelien

> On Apr 26, 2017, at 14:16, Bland, Wesley <wesley.bland at intel.com<mailto:wesley.bland at intel.com>> wrote:
>
> That's OK, we are not only fault tolerant, but also latency tolerant.
>
> -----Original Message-----
> From: mpiwg-ft-bounces at lists.mpi-forum.org<mailto:mpiwg-ft-bounces at lists.mpi-forum.org> [mailto:mpiwg-ft-bounces at lists.mpi-forum.org<mailto:mpiwg-ft-bounces at lists.mpi-forum.org>] On Behalf Of Ignacio Laguna
> Sent: Wednesday, April 26, 2017 11:15 AM
> To: MPI WG Fault Tolerance and Dynamic Process Control working Group <mpiwg-ft at lists.mpi-forum.org<mailto:mpiwg-ft at lists.mpi-forum.org>>; Bland, Wesley <wesley.bland at intel.com<mailto:wesley.bland at intel.com>>
> Subject: Re: [mpiwg-ft] FTWG Call Today
>
> Yes, I should be able to do that. I may be around 5 minutes late though.
>
> --
> Ignacio Laguna
> Center for Applied Scientific Computing (CASC) Lawrence Livermore National Laboratory
> Phone: 925-422-7308<tel:925-422-7308>, Fax: 925-422-6287<tel:925-422-6287>
>
> On 4/26/17 9:27 AM, Bland, Wesley wrote:
>> Hi FTWG,
>>
>> If I'm not mistaken, we have an FTWG con call today. On the agenda is continuing the discussion of the concerns around ULFM. Ignacio, are you ready to present the Reinit slides you had a month or two ago while I was out?
>>
>> Here's the call-in information:
>>
>> .........................................................................................................................................
>> Join online meeting
>> https://meet.intel.com/wesley.bland/WBF3C1SD
>>
>> Join by Phone
>> +1 ((916) 356-2663<tel:%28916%29%20356-2663> (or your local bridge access #) Choose bridge 5.
>>
>> Conference ID: 494710807
>> .........................................................................................................................................
>>
>> Thanks,
>> Wesley
>> _______________________________________________
>> mpiwg-ft mailing list
>> mpiwg-ft at lists.mpi-forum.org<mailto:mpiwg-ft at lists.mpi-forum.org>
>> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-ft
>
> _______________________________________________
> mpiwg-ft mailing list
> mpiwg-ft at lists.mpi-forum.org<mailto:mpiwg-ft at lists.mpi-forum.org>
> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-ft
> _______________________________________________
> mpiwg-ft mailing list
> mpiwg-ft at lists.mpi-forum.org<mailto:mpiwg-ft at lists.mpi-forum.org>
> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-ft

_______________________________________________
mpiwg-ft mailing list
mpiwg-ft at lists.mpi-forum.org<mailto:mpiwg-ft at lists.mpi-forum.org>
https://lists.mpi-forum.org/mailman/listinfo/mpiwg-ft
_______________________________________________
mpiwg-ft mailing list
mpiwg-ft at lists.mpi-forum.org<mailto:mpiwg-ft at lists.mpi-forum.org>
https://lists.mpi-forum.org/mailman/listinfo/mpiwg-ft



--
Anthony Skjellum, PhD
skjellum at gmail.com<mailto:skjellum at gmail.com>
Cell: +1-205-807-4968<tel:(205)%20807-4968>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20170504/26ac10d5/attachment-0001.html>


More information about the mpiwg-ft mailing list