[mpiwg-ft] [EXTERNAL] Re: FTWG Con Call Today
Van Der Wijngaart, Rob F
rob.f.van.der.wijngaart at intel.com
Wed Dec 21 10:49:58 CST 2016
That is a great idea, Keita, and I will be happy to participate to explore the relative areas of strength of these respective models. I think the first steps should be:
· define a (small) collection of workloads that represent major MPI usage models, such as: master worker, SPMD, small number of communicators with mostly point-to-point communications, many different communicators with mostly collective operations, strictly iterative (identical iterations) versus semi-iterative (same iterations, but data set sizes change due to Adaptive Mesh Refinement, for example), versus non-iterative (some graph problems, such as betweenness centrality), multi-physics models, hybrid MPI/threading models, one-sided communications, applications that use external libraries. Of course, these models are not mutually exclusive. We shouldn’t go crazy here, but make sure we have a sufficient basis for evaluation, and then parameterize these workloads.
· Define various fault modes to be covered.
Rob
-----Original Message-----
From: mpiwg-ft-bounces at lists.mpi-forum.org [mailto:mpiwg-ft-bounces at lists.mpi-forum.org] On Behalf Of Teranishi, Keita
Sent: Tuesday, December 20, 2016 11:27 PM
To: ilaguna at llnl.gov; MPI WG Fault Tolerance and Dynamic Process Control working Group <mpiwg-ft at lists.mpi-forum.org>; Bland, Wesley <wesley.bland at intel.com>
Subject: Re: [mpiwg-ft] [EXTERNAL] Re: FTWG Con Call Today
Ignacio,
Yes, ReInit and Fenix-1.0 have the same recovery model. They use longjump for global rollback and fix MPI communicator at the end of "Init² call. I am very happy to perform the feasibility studies of these three (plus one) models. I think that it will be great if we can explore the feasibility through some empirical (prototyping) studies.
As for 4th (ReInit/Fenix-1.0) model, we should have clear definition MPI communicator recovery including subcommunicators. In order to utilize the next generation checkpoint library (ECP¹s multi-level checkpointing
project) or accommodate application specific recovery schemes, MPI_Comm should provide some information of its past (history failures or change in the rank, comm_size, etc.) as well as its current state. I am hoping that our experience with Fenix will help to design a new spec.
Thanks,
---------------------------------------------------------------------------
--
Keita Teranishi
Principal Member of Technical Staff
Scalable Modeling and Analysis Systems
Sandia National Laboratories
Livermore, CA 94551
+1 (925) 294-3738
On 12/20/16, 4:55 PM, "Ignacio Laguna" <lagunaperalt1 at llnl.gov<mailto:lagunaperalt1 at llnl.gov>> wrote:
>Hi Keita,
>
>I think we all agree that there is no silver bullet solution for the FT
>problem and that each recovery model (whether it's ULFM, Reinit, Fenix,
>or ULFM+autorecovery) works for some codes but doesn't work for others,
>and that one of the solutions to cover all applications is to allow
>multiple recovery models.
>
>In the last telecon we discussed two ways to do that: (a) all models
>are compatible with each other; (b) they are not compatible, thus the
>application has to select the model to be used (which implies libraries
>used by the application have to support that model as well). The ideal
>case is (a), but we are not sure if it's possible, thus we are going to
>discuss each model in detail to explore that possibility. I believe
>case
>(b) is always a possibility, in which case you can still run Fenix on
>top of ULFM in that situation.
>
>BTW, correct me if I'm wrong, but Reinit and Fenix share (at a
>high-level) the same idea of global backward recovery with longjumps to
>reinject execution; thus we should call the 4rth option perhaps
>Reinit/Fenix.
>
>Ignacio
>
>
>On 12/20/16 3:06 PM, Teranishi, Keita wrote:
>> All,
>>
>> Throughout the discussion, I am a bit worried about making MPI bigger
>> than message passing interface because I wish MPI to serve a good
>> abstraction of user-friendly transport layer. Fenix is intended to
>> leverage the minimalist approach of MPI-FT (ULFM today) to cover most
>> of online recovery models for parallel programs using MPI. The
>> current version is designed to support SPMD (Communicating Sequential
>> Process) model, but we wish to support other models including
>> Master-Worker, Distributed Asynchronous Many Task (AMT) and Message-Logging.
>>
>> ·ULFM: We have requested non-blocking communicator recovery as well as
>> non-blocking comm_dup and comm_split, etc. ULFM already provides good
>> mechanism to serve master-worker type recovery like UQ, model
>> reduction and a certain family of eigenvalue solvers. I wish to have
>> more fine control for revocation because it is possible to keep the
>> certain connection of survived process (for master-worker or
>> task-parallel computing), but it might be too difficult.
>>
>> ·ULFM + Auto recovery: I need clarification from Wesly (as my
>> knowledge is wrong most likelyŠ but let me continue based on my assumption).
>> Fenix assumes that failure happens at a single or a small number of
>> processes. In this model, auto-recovery could serve as
>> un-coordinated recovery because no comm_shrink call is used to fix the communicator.
>> This could help message reply of uncoordinated recovery model. For
>> example, recovery is never manifested as ³Failure² to the survived
>> ranks, making particular message passing calls very slow. For SPMD
>> model, adaptation is so challenging as the user needs to write how to
>> recover the lost state of failed processes. However, I can see a
>> great benefit for implementing resilient task parallel programming model.
>>
>> ·Communicator with hole: Master-Worker type applications will benefit
>> from this when making collectives to gather the data available.
>>
>> ·MPI_ReInit: MPI_ReInit is very close to the current Fenix model.
>> We have written the API specification (see attached) to support the
>> same type of online recovery (global rollback upon process failure).
>> The code is implemented using MPI-ULFM, and we have seen some issues
>> with MPI-ULFM that makes multiple communicator recovery convoluted.
>> We used PMPI to hide all the details of error handling, garbage
>> collection and communicator recovery. The rollback (to Fenix_Init) is
>> performed through longjmp. Nice features of Fenix are (1) an idea of
>> *resilient
>> communicator* that allows the users to specify which communicator
>> needs to be automatically fixed and (2) *callback functions* to
>> assist application-specific recovery followed by communicator
>> recovery. We originally do not intend Fenix to be part of the MPI
>> standard because we want the role of MPI confined within ³Message Passing² and do not want
>> delay the MPI standardization discussions. My understanding with
>> MPI_ReInit is standardizing online-rollback recovery and keeping
>> PMPI/QMPI layer clean through a tight binding with the layers
>> invisible to typical MPI users (or tool developers) --- Ignacio,
>> please correct me if I am wrong. My biggest concern of MPI_ReInit is
>> that defining rollback model by Message Passing Library may violate
>> the original design philosophy of MPI (again this is the reason why
>> we did not propose Fenix as MPI standard). Another concern is that
>> it might be difficult to keep other recovery options open, but it
>> gets much more flexible with a few knobs in the APIs. I think the
>> latter is easy to fix with some switches in APIs. I think we can
>> figure out the options as we discuss further.
>>
>> Thanks,
>>
>> Keita
>>
>> *From: *"Bland, Wesley" <wesley.bland at intel.com<mailto:wesley.bland at intel.com>>
>> *Date: *Tuesday, December 20, 2016 at 1:48 PM
>> *To: *MPI WG Fault Tolerance and Dynamic Process Control working
>> Group <mpiwg-ft at lists.mpi-forum.org<mailto:mpiwg-ft at lists.mpi-forum.org>>, "Teranishi, Keita"
>> <knteran at sandia.gov<mailto:knteran at sandia.gov>>
>> *Subject: *Re: [mpiwg-ft] [EXTERNAL] Re: FTWG Con Call Today
>>
>> Probably here since we don't have an issue for this discussion. If
>> you want to open issues in our working group's repository
>> (github.com/mpiwg-ft/ft-issues), that's probably fine.
>>
>> On December 20, 2016 at 3:47:25 PM, Teranishi, Keita
>> (knteran at sandia.gov<mailto:knteran at sandia.gov>
>> <mailto:knteran at sandia.gov>) wrote:
>>
>> Wesley,
>>
>> Should I do here or github issues?
>>
>> Thanks,
>>
>> Keita
>>
>> *From: *"Bland, Wesley" <wesley.bland at intel.com<mailto:wesley.bland at intel.com>>
>> *Date: *Tuesday, December 20, 2016 at 1:43 PM
>> *To: *MPI WG Fault Tolerance and Dynamic Process Control working
>> Group <mpiwg-ft at lists.mpi-forum.org<mailto:mpiwg-ft at lists.mpi-forum.org>>, "Teranishi, Keita"
>> <knteran at sandia.gov<mailto:knteran at sandia.gov>>
>> *Subject: *Re: [mpiwg-ft] [EXTERNAL] Re: FTWG Con Call Today
>>
>> You don't have to wait. :) If you have comments/concerns, you can
>> raise them here too.
>>
>> On December 20, 2016 at 3:38:47 PM, Teranishi, Keita
>> (knteran at sandia.gov<mailto:knteran at sandia.gov> <mailto:knteran at sandia.gov>) wrote:
>>
>> All,
>>
>> Sorry, I could not make it today. I will definitely join the
>> meeting next time to make comments/suggestions on the three
>> items (ULFM, ULFM+Auto, and ReInit) from Fenix perspective.
>>
>> Thanks,
>>
>> Keita
>>
>> *From: *<mpiwg-ft-bounces at lists.mpi-forum.org<mailto:mpiwg-ft-bounces at lists.mpi-forum.org>> on behalf of
>> "Bland, Wesley" <wesley.bland at intel.com<mailto:wesley.bland at intel.com>>
>> *Reply-To: *MPI WG Fault Tolerance and Dynamic Process Control
>> working Group <mpiwg-ft at lists.mpi-forum.org<mailto:mpiwg-ft at lists.mpi-forum.org>>
>> *Date: *Tuesday, December 20, 2016 at 1:29 PM
>> *To: *FTWG <mpiwg-ft at lists.mpi-forum.org<mailto:mpiwg-ft at lists.mpi-forum.org>>
>> *Subject: *[EXTERNAL] Re: [mpiwg-ft] FTWG Con Call Today
>>
>> The notes from today's call are posted on the wiki:
>>
>> https://github.com/mpiwg-ft/ft-issues/wiki/2016-12-20
>>
>> Those who have specific items, please make progress on those
>> between now and our next meeting. We will be cancelling the Jan
>> 3 call due to the holiday. The next call will be on Jan 17.
>>
>> Thanks,
>>
>> Wesley
>>
>> On December 20, 2016 at 8:15:06 AM, Bland, Wesley
>> (wesley.bland at intel.com<mailto:wesley.bland at intel.com> <mailto:wesley.bland at intel.com>) wrote:
>>
>> The Fault Tolerance Working Group¹s biweekly con call is
>> today at 3:00 PM Eastern. Today's agenda:
>>
>> * Recap of face to face meeting
>>
>> * Go over existing tickets
>>
>> * Discuss concerns with ULFM and path forward
>>
>> Thanks,
>>
>> Wesley
>>
>>
>>.........................................................................
>>................................................................
>>
>> Join online meeting
>> <https://meet.intel.com/wesley.bland/GHHKQ79Y>
>>
>> https://meet.intel.com/wesley.bland/GHHKQ79Y
>>
>> Join by Phone
>>
>> +1(916)356-2663 (or your local bridge access #) Choose
>>bridge 5.
>>
>> Find a local number <https://dial.intel.com>
>>
>> Conference ID: 757343533
>>
>> Forgot your dial-in PIN? <https://dial.intel.com> | First
>> online meeting?
>>
>><http://r.office.microsoft.com/r/rlidOC10?clid=1033&p1=4&p2=1041&pc=oc
>>&ve
>>r=4&subver=0&bld=7185&bldver=0>
>>
>>
>>.........................................................................
>>................................................................
>>
>> _______________________________________________
>> mpiwg-ft mailing list
>> mpiwg-ft at lists.mpi-forum.org<mailto:mpiwg-ft at lists.mpi-forum.org>
>> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-ft
>>
>>
>>
>> _______________________________________________
>> mpiwg-ft mailing list
>> mpiwg-ft at lists.mpi-forum.org<mailto:mpiwg-ft at lists.mpi-forum.org>
>> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-ft
>>
_______________________________________________
mpiwg-ft mailing list
mpiwg-ft at lists.mpi-forum.org<mailto:mpiwg-ft at lists.mpi-forum.org>
https://lists.mpi-forum.org/mailman/listinfo/mpiwg-ft
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20161221/1d02a1fc/attachment-0001.html>
More information about the mpiwg-ft
mailing list