[mpiwg-ft] [EXTERNAL] Re: FTWG Con Call Today

Teranishi, Keita knteran at sandia.gov
Tue Dec 20 23:00:29 CST 2016


Jeff,

I admit I have been mistaken in the definition of runtime, but MPI’s has been serving a thin (but robust) runtime substrate for various programing models.
Even for CSP/SPMD models, I think several successful frameworks such as PETSc and Trillions are owing to MPI’s design.

Just going back to the original discussion, I hope resilience of CSP/SPMD can be done in the same philosophy and will introduce minimal additions to MPI.

Thanks,
-----------------------------------------------------------------------------
Keita Teranishi
Principal Member of Technical Staff
Scalable Modeling and Analysis Systems
Sandia National Laboratories
Livermore, CA 94551
+1 (925) 294-3738


From: <mpiwg-ft-bounces at lists.mpi-forum.org<mailto:mpiwg-ft-bounces at lists.mpi-forum.org>> on behalf of Jeff Hammond <jeff.science at gmail.com<mailto:jeff.science at gmail.com>>
Reply-To: MPI WG Fault Tolerance and Dynamic Process Control working Group <mpiwg-ft at lists.mpi-forum.org<mailto:mpiwg-ft at lists.mpi-forum.org>>
Date: Tuesday, December 20, 2016 at 3:31 PM
To: MPI WG Fault Tolerance and Dynamic Process Control working Group <mpiwg-ft at lists.mpi-forum.org<mailto:mpiwg-ft at lists.mpi-forum.org>>
Subject: Re: [mpiwg-ft] [EXTERNAL] Re: FTWG Con Call Today

MPI has not been a message passing library since MPI-1.  It's a runtime system for HPC that provides interprocess communication of many kinds (not just message passing - see RMA) as well as parallel file I/O.

Jeff

On Tue, Dec 20, 2016 at 3:06 PM, Teranishi, Keita <knteran at sandia.gov<mailto:knteran at sandia.gov>> wrote:
All,

Throughout the discussion, I am a bit worried about making MPI bigger than message passing interface because I wish MPI to serve a good abstraction of user-friendly transport layer.  Fenix is intended to leverage the minimalist approach of MPI-FT (ULFM today) to cover most of online recovery models for parallel programs using MPI.  The current version is designed to support SPMD (Communicating Sequential Process) model, but we wish to support other models including Master-Worker, Distributed Asynchronous Many Task (AMT) and Message-Logging.


·         ULFM: We have requested non-blocking communicator recovery as well as non-blocking comm_dup and comm_split, etc.   ULFM already provides good mechanism to serve master-worker type recovery like UQ, model reduction and a certain family of eigenvalue solvers.  I wish to have more fine control for revocation because it is possible to keep the certain connection of survived process (for master-worker or task-parallel computing), but it might be too difficult.



·         ULFM + Auto recovery: I need clarification from Wesly (as my knowledge is wrong most likely… but let me continue based on my assumption).  Fenix assumes that failure happens at a single or a small number of processes.  In this model, auto-recovery could serve as un-coordinated recovery because no comm_shrink call is used to fix the communicator.  This could help message reply of uncoordinated recovery model.  For example, recovery is never manifested as “Failure” to the survived ranks, making particular message passing calls very slow.   For SPMD model, adaptation is so challenging as the user needs to write how to recover the lost state of failed processes.  However, I can see a great benefit for implementing resilient task parallel programming model.



·         Communicator with hole: Master-Worker type applications will benefit from this when making collectives to gather the data available.



·         MPI_ReInit:  MPI_ReInit is very close to the current Fenix model.  We have written the API specification (see attached) to support the same type of online recovery (global rollback upon process failure).  The code is implemented using MPI-ULFM, and we have seen some issues with MPI-ULFM that makes multiple communicator recovery convoluted.  We used PMPI to hide all the details of error handling, garbage collection and communicator recovery.  The rollback (to Fenix_Init) is performed through longjmp.  Nice features of Fenix are (1) an idea of resilient communicator that allows the users to specify which communicator needs to be automatically fixed and (2) callback functions to assist application-specific recovery followed by communicator recovery.  We originally do not intend Fenix to be part of the MPI standard because we want the role of MPI confined within “Message Passing” and do not want delay the MPI standardization discussions.    My understanding with MPI_ReInit is standardizing online-rollback recovery and keeping PMPI/QMPI layer clean through a tight binding with the layers invisible to typical MPI users (or tool developers) --- Ignacio, please correct me if I am wrong.  My biggest concern of MPI_ReInit is that defining rollback model by Message Passing Library may violate the original design philosophy of MPI (again this is the reason why we did not propose Fenix as MPI standard).  Another concern is that it might be difficult to keep other recovery options open, but it gets much more flexible with a few knobs in the APIs.  I think the latter is easy to fix with some switches in APIs.  I think we can figure out the options as we discuss further.

Thanks,
Keita


From: "Bland, Wesley" <wesley.bland at intel.com<mailto:wesley.bland at intel.com>>
Date: Tuesday, December 20, 2016 at 1:48 PM

To: MPI WG Fault Tolerance and Dynamic Process Control working Group <mpiwg-ft at lists.mpi-forum.org<mailto:mpiwg-ft at lists.mpi-forum.org>>, "Teranishi, Keita" <knteran at sandia.gov<mailto:knteran at sandia.gov>>
Subject: Re: [mpiwg-ft] [EXTERNAL] Re: FTWG Con Call Today


Probably here since we don't have an issue for this discussion. If you want to open issues in our working group's repository (github.com/mpiwg-ft/ft-issues<http://github.com/mpiwg-ft/ft-issues>), that's probably fine.


On December 20, 2016 at 3:47:25 PM, Teranishi, Keita (knteran at sandia.gov<mailto:knteran at sandia.gov>) wrote:
Wesley,

Should I do here or github issues?

Thanks,
Keita


From: "Bland, Wesley" <wesley.bland at intel.com<mailto:wesley.bland at intel.com>>
Date: Tuesday, December 20, 2016 at 1:43 PM
To: MPI WG Fault Tolerance and Dynamic Process Control working Group <mpiwg-ft at lists.mpi-forum.org<mailto:mpiwg-ft at lists.mpi-forum.org>>, "Teranishi, Keita" <knteran at sandia.gov<mailto:knteran at sandia.gov>>
Subject: Re: [mpiwg-ft] [EXTERNAL] Re: FTWG Con Call Today


You don't have to wait. :) If you have comments/concerns, you can raise them here too.


On December 20, 2016 at 3:38:47 PM, Teranishi, Keita (knteran at sandia.gov<mailto:knteran at sandia.gov>) wrote:
All,

Sorry, I could not make it today.  I will definitely join the meeting next time to make comments/suggestions on the three items (ULFM, ULFM+Auto, and ReInit) from Fenix perspective.

Thanks,
Keita

From: <mpiwg-ft-bounces at lists.mpi-forum.org<mailto:mpiwg-ft-bounces at lists.mpi-forum.org>> on behalf of "Bland, Wesley" <wesley.bland at intel.com<mailto:wesley.bland at intel.com>>
Reply-To: MPI WG Fault Tolerance and Dynamic Process Control working Group <mpiwg-ft at lists.mpi-forum.org<mailto:mpiwg-ft at lists.mpi-forum.org>>
Date: Tuesday, December 20, 2016 at 1:29 PM
To: FTWG <mpiwg-ft at lists.mpi-forum.org<mailto:mpiwg-ft at lists.mpi-forum.org>>
Subject: [EXTERNAL] Re: [mpiwg-ft] FTWG Con Call Today

The notes from today's call are posted on the wiki:

https://github.com/mpiwg-ft/ft-issues/wiki/2016-12-20

Those who have specific items, please make progress on those between now and our next meeting. We will be cancelling the Jan 3 call due to the holiday. The next call will be on Jan 17.

Thanks,
Wesley



On December 20, 2016 at 8:15:06 AM, Bland, Wesley (wesley.bland at intel.com<mailto:wesley.bland at intel.com>) wrote:

The Fault Tolerance Working Group’s biweekly con call is today at 3:00 PM Eastern. Today's agenda:

* Recap of face to face meeting
* Go over existing tickets
* Discuss concerns with ULFM and path forward

Thanks,
Wesley

.........................................................................................................................................
Join online meeting <https://meet.intel.com/wesley.bland/GHHKQ79Y>
https://meet.intel.com/wesley.bland/GHHKQ79Y

Join by Phone
+1(916)356-2663 (or your local bridge access #) Choose bridge 5.
Find a local number <https://dial.intel.com>

Conference ID: 757343533

Forgot your dial-in PIN? <https://dial.intel.com> | First online meeting? <http://r.office.microsoft.com/r/rlidOC10?clid=1033&p1=4&p2=1041&pc=oc&ver=4&subver=0&bld=7185&bldver=0>
.........................................................................................................................................
_______________________________________________
mpiwg-ft mailing list
mpiwg-ft at lists.mpi-forum.org<mailto:mpiwg-ft at lists.mpi-forum.org>
https://lists.mpi-forum.org/mailman/listinfo/mpiwg-ft

_______________________________________________
mpiwg-ft mailing list
mpiwg-ft at lists.mpi-forum.org<mailto:mpiwg-ft at lists.mpi-forum.org>
https://lists.mpi-forum.org/mailman/listinfo/mpiwg-ft



--
Jeff Hammond
jeff.science at gmail.com<mailto:jeff.science at gmail.com>
http://jeffhammond.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20161221/e9a75981/attachment-0001.html>


More information about the mpiwg-ft mailing list