[mpiwg-ft] [EXTERNAL] Re: FTWG Con Call Today

Van Der Wijngaart, Rob F rob.f.van.der.wijngaart at intel.com
Wed Dec 21 10:49:58 CST 2016


That is a great idea, Keita, and I will be happy to participate to explore the relative areas of strength of these respective models. I think the first steps should be:

·        define a (small) collection of workloads that represent major MPI usage models, such as: master worker, SPMD, small number of communicators with mostly point-to-point communications, many different communicators with mostly collective operations, strictly iterative (identical iterations) versus semi-iterative (same iterations, but data set sizes change due to Adaptive Mesh Refinement, for example), versus non-iterative (some graph problems, such as betweenness centrality), multi-physics models, hybrid MPI/threading models, one-sided communications, applications that use external libraries. Of course, these models are not mutually exclusive. We shouldn’t go crazy here, but make sure we have a sufficient basis for evaluation, and then parameterize these workloads.

·        Define various fault modes to be covered.



Rob



-----Original Message-----
From: mpiwg-ft-bounces at lists.mpi-forum.org [mailto:mpiwg-ft-bounces at lists.mpi-forum.org] On Behalf Of Teranishi, Keita
Sent: Tuesday, December 20, 2016 11:27 PM
To: ilaguna at llnl.gov; MPI WG Fault Tolerance and Dynamic Process Control working Group <mpiwg-ft at lists.mpi-forum.org>; Bland, Wesley <wesley.bland at intel.com>
Subject: Re: [mpiwg-ft] [EXTERNAL] Re: FTWG Con Call Today



Ignacio,



Yes, ReInit and Fenix-1.0 have the same recovery model. They use longjump for global rollback and fix MPI communicator at the end of "Init² call.  I am very happy to perform the feasibility studies of these three (plus one) models.  I think that it will be great if we can explore the feasibility through some empirical (prototyping) studies.



As for 4th (ReInit/Fenix-1.0) model, we should have clear definition MPI communicator recovery including subcommunicators.  In order to utilize the next generation checkpoint library (ECP¹s multi-level checkpointing

project) or accommodate application specific recovery schemes, MPI_Comm should provide some information of its past (history failures or change in the rank, comm_size, etc.) as well as its current state.  I am hoping that our experience with Fenix will help to design a new spec.



Thanks,

---------------------------------------------------------------------------

--

Keita Teranishi

Principal Member of Technical Staff

Scalable Modeling and Analysis Systems

Sandia National Laboratories

Livermore, CA 94551

+1 (925) 294-3738











On 12/20/16, 4:55 PM, "Ignacio Laguna" <lagunaperalt1 at llnl.gov<mailto:lagunaperalt1 at llnl.gov>> wrote:



>Hi Keita,

>

>I think we all agree that there is no silver bullet solution for the FT

>problem and that each recovery model (whether it's ULFM, Reinit, Fenix,

>or ULFM+autorecovery) works for some codes but doesn't work for others,

>and that one of the solutions to cover all applications is to allow

>multiple recovery models.

>

>In the last telecon we discussed two ways to do that: (a) all models

>are compatible with each other; (b) they are not compatible, thus the

>application has to select the model to be used (which implies libraries

>used by the application have to support that model as well). The ideal

>case is (a), but we are not sure if it's possible, thus we are going to

>discuss each model in detail to explore that possibility. I believe

>case

>(b) is always a possibility, in which case you can still run Fenix on

>top of ULFM in that situation.

>

>BTW, correct me if I'm wrong, but Reinit and Fenix share (at a

>high-level) the same idea of global backward recovery with longjumps to

>reinject execution; thus we should call the 4rth option perhaps

>Reinit/Fenix.

>

>Ignacio

>

>

>On 12/20/16 3:06 PM, Teranishi, Keita wrote:

>> All,

>>

>> Throughout the discussion, I am a bit worried about making MPI bigger

>> than message passing interface because I wish MPI to serve a good

>> abstraction of user-friendly transport layer.  Fenix is intended to

>> leverage the minimalist approach of MPI-FT (ULFM today) to cover most

>> of online recovery models for parallel programs using MPI.  The

>> current version is designed to support SPMD (Communicating Sequential

>> Process) model, but we wish to support other models including

>> Master-Worker, Distributed Asynchronous Many Task (AMT) and Message-Logging.

>>

>> ·ULFM: We have requested non-blocking communicator recovery as well as

>> non-blocking comm_dup and comm_split, etc.   ULFM already provides good

>> mechanism to serve master-worker type recovery like UQ, model

>> reduction and a certain family of eigenvalue solvers.  I wish to have

>> more fine control for revocation because it is possible to keep the

>> certain connection of survived process (for master-worker or

>> task-parallel computing), but it might be too difficult.

>>

>> ·ULFM + Auto recovery: I need clarification from Wesly (as my

>> knowledge is wrong most likelyŠ but let me continue based on my assumption).

>> Fenix assumes that failure happens at a single or a small number of

>> processes.  In this model, auto-recovery could serve as

>> un-coordinated recovery because no comm_shrink call is used to fix the communicator.

>> This could help message reply of uncoordinated recovery model.  For

>> example, recovery is never manifested as ³Failure² to the survived

>> ranks, making particular message passing calls very slow.   For SPMD

>> model, adaptation is so challenging as the user needs to write how to

>> recover the lost state of failed processes.  However, I can see a

>> great benefit for implementing resilient task parallel programming model.

>>

>> ·Communicator with hole: Master-Worker type applications will benefit

>> from this when making collectives to gather the data available.

>>

>> ·MPI_ReInit:  MPI_ReInit is very close to the current Fenix model.

>> We have written the API specification (see attached) to support the

>> same type of online recovery (global rollback upon process failure).

>> The code is implemented using MPI-ULFM, and we have seen some issues

>> with MPI-ULFM that makes multiple communicator recovery convoluted.

>> We used PMPI to hide all the details of error handling, garbage

>> collection and communicator recovery. The rollback (to Fenix_Init) is

>> performed through longjmp.  Nice features of Fenix are (1) an idea of

>> *resilient

>> communicator* that allows the users to specify which communicator

>> needs to be automatically fixed and (2) *callback functions* to

>> assist application-specific recovery followed by communicator

>> recovery.  We originally do not intend Fenix to be part of the MPI

>> standard because we want the role of MPI confined within ³Message Passing² and do not want

>> delay the MPI standardization discussions.    My understanding with

>> MPI_ReInit is standardizing online-rollback recovery and keeping

>> PMPI/QMPI layer clean through a tight binding with the layers

>> invisible to typical MPI users (or tool developers) --- Ignacio,

>> please correct me if I am wrong.  My biggest concern of MPI_ReInit is

>> that defining rollback model by Message Passing Library may violate

>> the original design philosophy of MPI (again this is the reason why

>> we did not propose Fenix as MPI standard).  Another concern is that

>> it might be difficult to keep other recovery options open, but it

>> gets much more flexible with a few knobs in the APIs.  I think the

>> latter is easy to fix with some switches in APIs.  I think we can

>> figure out the options as we discuss further.

>>

>> Thanks,

>>

>> Keita

>>

>> *From: *"Bland, Wesley" <wesley.bland at intel.com<mailto:wesley.bland at intel.com>>

>> *Date: *Tuesday, December 20, 2016 at 1:48 PM

>> *To: *MPI WG Fault Tolerance and Dynamic Process Control working

>> Group <mpiwg-ft at lists.mpi-forum.org<mailto:mpiwg-ft at lists.mpi-forum.org>>, "Teranishi, Keita"

>> <knteran at sandia.gov<mailto:knteran at sandia.gov>>

>> *Subject: *Re: [mpiwg-ft] [EXTERNAL] Re: FTWG Con Call Today

>>

>> Probably here since we don't have an issue for this discussion. If

>> you want to open issues in our working group's repository

>> (github.com/mpiwg-ft/ft-issues), that's probably fine.

>>

>> On December 20, 2016 at 3:47:25 PM, Teranishi, Keita

>> (knteran at sandia.gov<mailto:knteran at sandia.gov>

>> <mailto:knteran at sandia.gov>) wrote:

>>

>>     Wesley,

>>

>>     Should I do here or github issues?

>>

>>     Thanks,

>>

>>     Keita

>>

>>     *From: *"Bland, Wesley" <wesley.bland at intel.com<mailto:wesley.bland at intel.com>>

>>     *Date: *Tuesday, December 20, 2016 at 1:43 PM

>>     *To: *MPI WG Fault Tolerance and Dynamic Process Control working

>>     Group <mpiwg-ft at lists.mpi-forum.org<mailto:mpiwg-ft at lists.mpi-forum.org>>, "Teranishi, Keita"

>>     <knteran at sandia.gov<mailto:knteran at sandia.gov>>

>>     *Subject: *Re: [mpiwg-ft] [EXTERNAL] Re: FTWG Con Call Today

>>

>>     You don't have to wait. :) If you have comments/concerns, you can

>>     raise them here too.

>>

>>     On December 20, 2016 at 3:38:47 PM, Teranishi, Keita

>>     (knteran at sandia.gov<mailto:knteran at sandia.gov> <mailto:knteran at sandia.gov>) wrote:

>>

>>         All,

>>

>>         Sorry, I could not make it today.  I will definitely join the

>>         meeting next time to make comments/suggestions on the three

>>         items (ULFM, ULFM+Auto, and ReInit) from Fenix perspective.

>>

>>         Thanks,

>>

>>         Keita

>>

>>         *From: *<mpiwg-ft-bounces at lists.mpi-forum.org<mailto:mpiwg-ft-bounces at lists.mpi-forum.org>> on behalf of

>>         "Bland, Wesley" <wesley.bland at intel.com<mailto:wesley.bland at intel.com>>

>>         *Reply-To: *MPI WG Fault Tolerance and Dynamic Process Control

>>         working Group <mpiwg-ft at lists.mpi-forum.org<mailto:mpiwg-ft at lists.mpi-forum.org>>

>>         *Date: *Tuesday, December 20, 2016 at 1:29 PM

>>         *To: *FTWG <mpiwg-ft at lists.mpi-forum.org<mailto:mpiwg-ft at lists.mpi-forum.org>>

>>         *Subject: *[EXTERNAL] Re: [mpiwg-ft] FTWG Con Call Today

>>

>>         The notes from today's call are posted on the wiki:

>>

>>         https://github.com/mpiwg-ft/ft-issues/wiki/2016-12-20

>>

>>         Those who have specific items, please make progress on those

>>         between now and our next meeting. We will be cancelling the Jan

>>         3 call due to the holiday. The next call will be on Jan 17.

>>

>>         Thanks,

>>

>>         Wesley

>>

>>         On December 20, 2016 at 8:15:06 AM, Bland, Wesley

>>         (wesley.bland at intel.com<mailto:wesley.bland at intel.com> <mailto:wesley.bland at intel.com>) wrote:

>>

>>             The Fault Tolerance Working Group¹s biweekly con call is

>>             today at 3:00 PM Eastern. Today's agenda:

>>

>>             * Recap of face to face meeting

>>

>>             * Go over existing tickets

>>

>>             * Discuss concerns with ULFM and path forward

>>

>>             Thanks,

>>

>>             Wesley

>>

>>

>>.........................................................................

>>................................................................

>>

>>             Join online meeting

>>             <https://meet.intel.com/wesley.bland/GHHKQ79Y>

>>

>>             https://meet.intel.com/wesley.bland/GHHKQ79Y

>>

>>             Join by Phone

>>

>>             +1(916)356-2663 (or your local bridge access #) Choose

>>bridge 5.

>>

>>             Find a local number <https://dial.intel.com>

>>

>>             Conference ID: 757343533

>>

>>             Forgot your dial-in PIN? <https://dial.intel.com> | First

>>             online meeting?

>>

>><http://r.office.microsoft.com/r/rlidOC10?clid=1033&p1=4&p2=1041&pc=oc

>>&ve

>>r=4&subver=0&bld=7185&bldver=0>

>>

>>

>>.........................................................................

>>................................................................

>>

>>         _______________________________________________

>>         mpiwg-ft mailing list

>>         mpiwg-ft at lists.mpi-forum.org<mailto:mpiwg-ft at lists.mpi-forum.org>

>>         https://lists.mpi-forum.org/mailman/listinfo/mpiwg-ft

>>

>>

>>

>> _______________________________________________

>> mpiwg-ft mailing list

>> mpiwg-ft at lists.mpi-forum.org<mailto:mpiwg-ft at lists.mpi-forum.org>

>> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-ft

>>



_______________________________________________

mpiwg-ft mailing list

mpiwg-ft at lists.mpi-forum.org<mailto:mpiwg-ft at lists.mpi-forum.org>

https://lists.mpi-forum.org/mailman/listinfo/mpiwg-ft
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20161221/1d02a1fc/attachment-0001.html>


More information about the mpiwg-ft mailing list