[Mpi3-ft] Communicator Virtualization as a step forward

Nathan DeBardeleben ndebard at lanl.gov
Thu Feb 12 13:46:29 CST 2009


Heh OK. :)  Oh well, maybe my statements can then provide some 
motivation for users that are interested in FT at very least :).

Sorry to pollute the conversation.

-- Nathan

---------------------------------------------------------------------
Nathan DeBardeleben, Ph.D.
Los Alamos National Laboratory
High Performance Computing Systems Integration (HPC-5)
phone: 505-667-3428
email: ndebard at lanl.gov
--------------------------------------------------------------------- 



Greg Bronevetsky wrote:
> I don't think that the users are suggesting that they don't want FT 
> support. It sounds like they just don't value having the ability to 
> reset the state of the MPI library without having to restart the 
> applications. Since job schedulers can start up a large-scale 
> application fairly quickly and they already use global checkpointing, 
> I'm not surprised that they don't really care about this. In any case, 
> our plans for the FT spec will allow for more capability than the 
> FT-MPI spec. FT-MPI is tilted towards global synchronous recovery 
> solutions, which will have scalability problems since every process 
> must participate in recovery. Our goal with the FT specification is to 
> allow localized recovery as well.
>
> Greg Bronevetsky
> Post-Doctoral Researcher
> 1028 Building 451
> Lawrence Livermore National Lab
> (925) 424-5756
> bronevetsky1 at llnl.gov
>
> At 11:16 AM 2/12/2009, Nathan DeBardeleben wrote:
>> I really worry about taking the advise of users saying they would 
>> rather terminate and restart an application than having some 
>> assistance to help them ride through a problem.  If they are worried 
>> about programming language/model changes, I would encourage them to 
>> open their eyes.
>> Major programming model changes are predicted for > petascale 
>> computers and even petascale computers are having a hard time with 
>> classical MPI programming.  I think we're more likely to see MPI as 
>> an underpinning of next-gen models.  These users polled might not be 
>> extreme-scale users, however.
>> Working at a laboratory positioning itself for exascale, we are 
>> intimately aware of the fact that "oh just rerun it" is a worthless 
>> conclusion.  I wish I had more time to assist in this matter but our 
>> laboratory has cracked down on participation in things that are not 
>> directly associated with charge codes so it's a bit hard for me to 
>> spend any sizable amount of time.
>>
>> Please though, consider the user base when they say things like that.
>> I'm sure Rich is well aware of these similar concerns.  While MPI 
>> fault tolerance might not be important to the users running 1000 node 
>> systems, those of us approaching system mean time to interrupt under 
>> an hour are quite on the opposite side of that spectrum.
>>
>> Are the small-system users pushing for FT to not be inside of MPI?  
>> This is why I was so in favor of some sort of componentized MPI where 
>> users could exclude FT if they weren't worried about reliability (and 
>> thereby gain performance) but those of us who were in more dangerous 
>> reliability regimes could take the performance penalty and compile in 
>> / load in / configure in / whatever FT.
>>
>> -- Nathan
>>
>> ---------------------------------------------------------------------
>> Nathan DeBardeleben, Ph.D.
>> Los Alamos National Laboratory
>> High Performance Computing Systems Integration (HPC-5)
>> phone: 505-667-3428
>> email: ndebard at lanl.gov
>> ---------------------------------------------------------------------
>>
>>
>> Graham, Richard L. wrote:
>>> Josh,
>>>   Very early on in the process we got feedback from users that an 
>>> ft-mpi like interface was of no interest to them.  They would just 
>>> as soon terminate the application and restart rather than use this 
>>> sort of approach.  Having said that, there is already previous 
>>> demonstration that the ft-mpi approach is useful for some 
>>> applications.  If you look closely at the spec, the ft-mpi approach 
>>> is a subset. of the current subset.
>>>   I am working on pulling out the api's and expanding the 
>>> explanations.  The goal is to have this out before the next telecon 
>>> in two weeks.
>>>   Prototyping is under way, with ut, cray, and ornl committed to 
>>> working on this.  Right now supporting infrastructure is being 
>>> developed.
>>>   Your point on the mpi 2 interfaces is good.  A couple of people 
>>> had started to look at this when it looked like this might make it 
>>> into the 2.2 version.  The changes seemed to be more extensive than 
>>> expected, so work stopped.  This does need to be picked up on.
>>>
>>> Rich
>>> ------Original Message------
>>> From: Josh Hursey
>>> To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
>>> ReplyTo: MPI 3.0 Fault Tolerance and Dynamic Process Control working 
>>> Group
>>> Sent: Feb 12, 2009 8:31 AM
>>> Subject: Re: [Mpi3-ft] Communicator Virtualization as a step forward
>>>
>>> It is a good point that local communicator reconstruction operations
>>> require a fundamental change in the way communicators are handled by
>>> MPI. With that in mind it would probably take as much effort (if not
>>> more) to implement a virtualized version on top of MPI. So maybe it
>>> will not help as much as I had originally thought. Outside of the
>>> paper, do we have the interface and semantics of these operations
>>> described anywhere? I think that would help in trying to keep pace
>>> with the use cases.
>>>
>>> The spirit of the suggestion was as a way to separate what (I think)
>>> we can agree on as a first step (FT-MPI-like model) from the
>>> communicator reconstruction, which I see as a secondary step. If we
>>> stop to write up what the FT-MPI-like model should look like in the
>>> standard, then I think we can push forward on other fronts
>>> (prototyping of step 1, standardization of step 1, application
>>> implementations using step 1) while still trying to figure out how
>>> communication reconstruction should be expressed in the standard such
>>> that it is usable in target applications.
>>>
>>> So my motion is that the group explicitly focus effort on writing a
>>> document describing the FT-MPI-like model we consider as a
>>> foundation. Do so in the MPI standard language, and present it to the
>>> MPI Forum for a straw vote in the next couple of meetings. From this
>>> document we can continue evolving it to support more advanced
>>> features, like communicator reconstruction.
>>>
>>> I am willing to put effort into making such a document. However, I
>>> would like explicit support from the working group in pursing such an
>>> effort, and the help of anyone interested in helping write-up/define
>>> this specification.
>>>
>>> So what do people think taking this first step?
>>>
>>> -- Josh
>>>
>>>
>>> On Feb 11, 2009, at 5:57 PM, Greg Bronevetsky wrote:
>>>
>>>
>>>> I don't understand what you mean by "We can continue to pursue
>>>> communicator reconstruction interfaces though a virtualization
>>>> later above MPI."  To me it seems that such interfaces will
>>>> effectively need to implement communicators on top of MPI in order
>>>> be operational, which will take about as much effort as
>>>> implementing them inside MPI. In particular, I don't see a way to
>>>> recreate a communicator using the MPI interface without making
>>>> collective calls. However, we're defining MPI_Rejoin (or whatever
>>>> its called) to be a local operation. This means that we cannot use
>>>> the MPI communicators interface and must instead implement our own
>>>> communicators.
>>>>
>>>> The bottom line is that it does make sense to start implementing
>>>> support for the FT-MPI model and evolve that to a more elaborate
>>>> model. However, I don't think that working on the rest above MPI
>>>> will save us any effort or time.
>>>>
>>>> Greg Bronevetsky
>>>> Post-Doctoral Researcher
>>>> 1028 Building 451
>>>> Lawrence Livermore National Lab
>>>> (925) 424-5756
>>>> bronevetsky1 at llnl.gov
>>>>
>>>> At 01:17 PM 2/11/2009, Josh Hursey wrote:
>>>>
>>>>> In our meeting yesterday, I was sitting in the back trying to take in
>>>>> the complexity of communicator recreation. It seems that much of the
>>>>> confusion at the moment is that we (at least I) are still not exactly
>>>>> sure how the interface should be defined and implemented.
>>>>>
>>>>> I think of the process fault tolerance specification as a series of
>>>>> steps that can be individually specified building upon each step
>>>>> while
>>>>> working towards a specific goal set. From this I was asking
>>>>> myself, is
>>>>> there any foundational concepts that we can define now so that folks
>>>>> can start implementation.
>>>>>
>>>>> That being said I suggest that we consider FT-MPI's model of all
>>>>> communicators except the base 3 (COMM_WORLD, COMM_SELF, COMM_NULL)
>>>>> are
>>>>> destroyed on a failure as the starting point for implementation. This
>>>>> would get us started. We can continue to pursue communicator
>>>>> reconstruction interfaces though a virtualization later above MPI. We
>>>>> can use this layer to experiment with the communicator recreation
>>>>> mechanisms in conjunction with applications while pursing the first
>>>>> step implementation. Once we start to agree on the interface for
>>>>> communicator reconstruction, then we can start to push it into the
>>>>> MPI
>>>>> standard/library for a better standard/implementation.
>>>>>
>>>>> The communicator virtualization library is a staging area for these
>>>>> interface ideas that we seem to be struggling with. The
>>>>> virtualization
>>>>>
>>>
>>> ------Original Message Truncated------
>>>
>>> _______________________________________________
>>> mpi3-ft mailing list
>>> mpi3-ft at lists.mpi-forum.org
>>> http:// lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>>
>> _______________________________________________
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
>> http:// lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft



More information about the mpiwg-ft mailing list