[mpiwg-ft] FTWG Call Today

Anthony Skjellum skjellum at gmail.com
Thu May 4 17:25:43 CDT 2017


Rob, sorry for slow reply.  It is our intention to propose the set of
 concepts needed to realize FA-MPI to the forum:  as a complete motivating
proposal, but also in terms of verbs that enable us and, in this latter
mode, also enable others like ULFM, without insisting on FA-MPI itself as
the outcome.  Now, the main idea, the groupwise, non-blocking transactional
block API is probably something we can't live without, but that's TBD how
to standardize. Exploring how to get FA-MPI *functionality* vs. *exact API*
could be a new, orthogonal, complementary, cooperative activity of the
working group in addition to the drive for ULFM as a monolithic set of APIs.

We started in 2011-2012 with FA-MPI because I didn't think the presentation
that our colleague Rich Graham was giving in the forum in Chicago at one
meeting on the then-version of UFLM (or predecesssors) was the only way to
go,. You've seen the 2015 work, and we have active development on-going.
One of our students (Hassani, cc'd above) finished in 2016 at UAB, and his
PhD was the first implementation. Plus compact apps (LULESH and MiniFe).
It works well.  It doesn't have major issues and is going to be evolved
further.  It still requires that there be meaningful fault models and
ability to recover from true network  inconsistent states, so the work
continues.  We aren't satisfied with just the API consistency, and there
are hard, open problems when it comes to inconsistent network state.

The "issue" for us per se (perhaps that is a loaded word, OK, take it
politely as intended) is that ULFM is offered as an omnibus package; our
approach is also a complete solution.  They are quite different.  Both
could be adopted. The ULFM colleagues run this working group, so ULFM gets
the appropriate discussion :-)  They have the effort and sweat equity.  We
only have nice things to say about them :-) they've been working for many
years on this.

My students if not me are on all the FTWG calls.  But our strategy is to
build up confidence and experience and working use cases and small codes
and look for more and more best practice before bringing the ideas up again
at the forum for more formal consideration.  Puri Bangalore and I would be
happy to brief the group at the September meeting on where we are, to get
us back in the fold :-)  Puri is cc'd above.

Please remember that our goal is to support next-generation exascale apps,
and we did not insist on 100% API support.  Our expectation (this work is
being done by another student) is to transform
legacy MPI apps for fault tolerance using source-to-source translation
using the Rose compiler framework, which is robust enough for large-scale
applications.  That transformation approach will continue in parallel with
our FA-MPI efforts.  We don't think that 100% backward compatibility is a
good early constraint on research; legacy impacts of some features of MPI,
like blocking, impact the investigation... with early efforts to
standardize you more or less have to be that way; that's the ULFM and
predecessor path IMHO; again, with respect to all the hard work of George,
Aurelien, Wesley and others in their team.

This week, George and I have  just started a new side discussion about how
we might try to define common runtime infrastructure.  We will plan to
enhancement that discussion this summer.  But it is different from merging
proposals at the medium or higher levels.    I think this discussion of
common underlying low-level infrastructure could lead to further choices of
standardization too...

Puri andI are also working to show where atomic suboperations could be
offered instead of ULFM and FA-MPI, which could be common standardized APIs
useful for both; non-blocking, with features like timeouts... but this is
by no means uncontroversial.  We plan to put in a paper on this or poster
for EuroMPI.  We had an earlier paper showing how to look at FA-MPI-, ULFM
(as then proposed), and G-ULFM, our idea for non-blocking, lower-level
building blocks that "look like ULFM but are a bit generalized," some years
ago.    That did not get traction (yet).

It is apparent to me that we should standardize the minimal, low-level
primitives upon which UFLM, FA-MPI, Reinit, and others can be layered
without "much if any" loss of performance, fidelity, or functionality. For
instance, if you need a consensus on a group, OK, you get it.  But you
start from a nonblocking operation.  But you also are unwilling to wait
forever, so you have primitive features like timeout.  And if you can avoid
extra concensuses (is that a word?), you make that as justifieid as an
atomic.

The "reduced API" vs. the "complex API" approach to fault-aware primitives
is what I am getting at.   But, we want semantic features like timeout that
others argue so far are not needed.
We want to promote that the interaction is an application concern (which
might be woven in as an aspect that is differently coded on different HPC
architectures or even at different scales on the same architecture) that
communicates to/from MPI about errors, and that the MPI transaction blocks
support progress and error propagation.  All of which we have a prototype
for.  But, we don't know how to do blocking things well like MPI_Send() and
MPI_Recv(); MPI_Isend() and MPI_Irecv() are OK. We also have a difference
about fault-free overhead. Our view is that there is fault-free overhead
inherent in such middleware and that we need to trade off the potential for
lost work vs. the cost of fault awareness; others think this may be
negligible.  Further study is warranted.

This is in no way inviting a flame war on this list.   Our goal is to find,
with the whole group, the set of primitives we can all agree on, that let
each model operate without significant penalty,
and we ask the MPI Forum to approve those.  ULFM and FA-MPI and others we
will then work to layer to those "verbs" or "reduced API" through
explorations, design, and informal  negotiations I hope we could do in
2017-2018.  When  each group says "this set of verbs allows me to build my
model acceptably well", then we can just support them, and offer our
added-value libraries to support specific application preferences.  Ours
would work with a subset of MPI (the nonblocking and persistent nonblocking
operations most likely to be emphasized at exascale), the ULFM with the
entire MPI, etc.  But, the fundamental verbs project has to be done.  The
leverage of such an approach is the openness to allow others to do
differently than either high-level monolithic API set would do per se, and
maybe even allow more than one kind of fault tolerance in a big app, or an
app that uses 3rd party libraries.   [ They ight also do so far into the
future.  Perhaps such APIs belong in the MPI_T section of the standard,
because they are not for "end users" but rather "fault tolerant
programmers."]

Our separate exploration of Sessions and QoS for communicators is
complementary to the work thus far on FA-MPI, and is another path to
getting resoure visibility needed to recover after a fault. We have high
hopes for Session-based fault tolerant low-level standardization that
enables both FA-MPI and ULFM, but we are not far enough along there yet.
That is another WG.  Dan Holmes, Puri, and I are exploring Sessions for FT
and soft/firm RT/QoS; it is early days.

I'd be happy to discuss with you offline about FA-MPI on-going R&D; if you
have use cases, application requirements, agile-type stories, or ideas to
help us mature our work, great.

If we get invited to talk in the September WG at Chicago, we will certainly
update everyone then.  I can promise more engagement if there is space for
our effort to work in concert with our colleagues pursuing ULFM.
Unfortunately, neither Puri or I get to go to the June meeting ;-(.

Again, sorry for the slow response.

Respectfully,
Tony Skjellum




On Fri, Apr 28, 2017 at 1:31 AM, Van Der Wijngaart, Rob F <
rob.f.van.der.wijngaart at intel.com> wrote:

> Hello All,
>
> I wanted to make sure I/we are not missing important alternate MPI fault
> tolerance approaches. Keita had earlier pointed me to the Fault Aware MPI
> effort by Tony Skjellum et al. (I am not sure if his team members are on
> this mail alias), and I started reading their 2015 paper "Practical
> Resilient Cases for FA-MPI, A Transactional Fault-Tolerant MPI" again. It's
> an interesting approach, although I am, of course, very concerned about
> regular collective calls to corral faults, and also about the need to
> constrain applications to using non-blocking communications only. Is there
> any information/insight into how seriously this approach is being pursued
> for inclusion in the MPI standard? Thanks.
>
> Rob
>
> -----Original Message-----
> From: mpiwg-ft-bounces at lists.mpi-forum.org [mailto:mpiwg-ft-bounces at lists
> .mpi-forum.org] On Behalf Of Aurelien Bouteiller
> Sent: Wednesday, April 26, 2017 12:26 PM
> To: MPI WG Fault Tolerance and Dynamic Process Control working Group <
> mpiwg-ft at lists.mpi-forum.org>
> Cc: ilaguna at llnl.gov
> Subject: Re: [mpiwg-ft] FTWG Call Today
>
> Ignacio, could you circulate the slides I can’t join the Skype.
>
> Aurelien
>
> > On Apr 26, 2017, at 14:16, Bland, Wesley <wesley.bland at intel.com> wrote:
> >
> > That's OK, we are not only fault tolerant, but also latency tolerant.
> >
> > -----Original Message-----
> > From: mpiwg-ft-bounces at lists.mpi-forum.org [mailto:
> mpiwg-ft-bounces at lists.mpi-forum.org] On Behalf Of Ignacio Laguna
> > Sent: Wednesday, April 26, 2017 11:15 AM
> > To: MPI WG Fault Tolerance and Dynamic Process Control working Group <
> mpiwg-ft at lists.mpi-forum.org>; Bland, Wesley <wesley.bland at intel.com>
> > Subject: Re: [mpiwg-ft] FTWG Call Today
> >
> > Yes, I should be able to do that. I may be around 5 minutes late though.
> >
> > --
> > Ignacio Laguna
> > Center for Applied Scientific Computing (CASC) Lawrence Livermore
> National Laboratory
> > Phone: 925-422-7308, Fax: 925-422-6287
> >
> > On 4/26/17 9:27 AM, Bland, Wesley wrote:
> >> Hi FTWG,
> >>
> >> If I'm not mistaken, we have an FTWG con call today. On the agenda is
> continuing the discussion of the concerns around ULFM. Ignacio, are you
> ready to present the Reinit slides you had a month or two ago while I was
> out?
> >>
> >> Here's the call-in information:
> >>
> >> ............................................................
> ............................................................
> .................
> >> Join online meeting
> >> https://meet.intel.com/wesley.bland/WBF3C1SD
> >>
> >> Join by Phone
> >> +1 ((916) 356-2663 (or your local bridge access #) Choose bridge 5.
> >>
> >> Conference ID: 494710807
> >> ............................................................
> ............................................................
> .................
> >>
> >> Thanks,
> >> Wesley
> >> _______________________________________________
> >> mpiwg-ft mailing list
> >> mpiwg-ft at lists.mpi-forum.org
> >> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-ft
> >
> > _______________________________________________
> > mpiwg-ft mailing list
> > mpiwg-ft at lists.mpi-forum.org
> > https://lists.mpi-forum.org/mailman/listinfo/mpiwg-ft
> > _______________________________________________
> > mpiwg-ft mailing list
> > mpiwg-ft at lists.mpi-forum.org
> > https://lists.mpi-forum.org/mailman/listinfo/mpiwg-ft
>
> _______________________________________________
> mpiwg-ft mailing list
> mpiwg-ft at lists.mpi-forum.org
> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-ft
> _______________________________________________
> mpiwg-ft mailing list
> mpiwg-ft at lists.mpi-forum.org
> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-ft
>



-- 
Anthony Skjellum, PhD
skjellum at gmail.com
Cell: +1-205-807-4968 <(205)%20807-4968>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20170504/7cd9e81e/attachment-0001.html>


More information about the mpiwg-ft mailing list