[Mpi3-ft] Communicator Virtualization as a step forward

George Bosilca bosilca at eecs.utk.edu
Thu Feb 12 09:32:34 CST 2009


As in FT-MPI the errors are not limited to a specific communicator,  
all processes will be aware of the error at one moment in time.  
Moreover, as the rebuild of the MPI_COMM_WORLD in a global operation  
in FT-MPI (at least in the REBUILD mode), all processes are at one  
point in the errorhandler attached to the MPI_COM_WORLD. A library  
using the PMPI interface can effectively track all communicators  
created by the user application, and reconstruct them when the error  
occur. As all processes are aware of the fault, the fact that creating  
communicators is a global operation is not a problem as they will all  
be available for this.

Josh, the document that you talk about already exist. It was published  
in ISC'04. Here is the link: http://www.netlib.org/utk/people/JackDongarra/PAPERS/isc2004-FT-MPI.pdf

   george.

On Feb 12, 2009, at 08:31 , Josh Hursey wrote:

> It is a good point that local communicator reconstruction operations  
> require a fundamental change in the way communicators are handled by  
> MPI. With that in mind it would probably take as much effort (if not  
> more) to implement a virtualized version on top of MPI. So maybe it  
> will not help as much as I had originally thought. Outside of the  
> paper, do we have the interface and semantics of these operations  
> described anywhere? I think that would help in trying to keep pace  
> with the use cases.
>
> The spirit of the suggestion was as a way to separate what (I think)  
> we can agree on as a first step (FT-MPI-like model) from the  
> communicator reconstruction, which I see as a secondary step. If we  
> stop to write up what the FT-MPI-like model should look like in the  
> standard, then I think we can push forward on other fronts  
> (prototyping of step 1, standardization of step 1, application  
> implementations using step 1) while still trying to figure out how  
> communication reconstruction should be expressed in the standard  
> such that it is usable in target applications.
>
> So my motion is that the group explicitly focus effort on writing a  
> document describing the FT-MPI-like model we consider as a  
> foundation. Do so in the MPI standard language, and present it to  
> the MPI Forum for a straw vote in the next couple of meetings. From  
> this document we can continue evolving it to support more advanced  
> features, like communicator reconstruction.
>
> I am willing to put effort into making such a document. However, I  
> would like explicit support from the working group in pursing such  
> an effort, and the help of anyone interested in helping write-up/ 
> define this specification.
>
> So what do people think taking this first step?
>
> -- Josh
>
>
> On Feb 11, 2009, at 5:57 PM, Greg Bronevetsky wrote:
>
>> I don't understand what you mean by "We can continue to pursue  
>> communicator reconstruction interfaces though a virtualization  
>> later above MPI."  To me it seems that such interfaces will  
>> effectively need to implement communicators on top of MPI in order  
>> be operational, which will take about as much effort as  
>> implementing them inside MPI. In particular, I don't see a way to  
>> recreate a communicator using the MPI interface without making  
>> collective calls. However, we're defining MPI_Rejoin (or whatever  
>> its called) to be a local operation. This means that we cannot use  
>> the MPI communicators interface and must instead implement our own  
>> communicators.
>>
>> The bottom line is that it does make sense to start implementing  
>> support for the FT-MPI model and evolve that to a more elaborate  
>> model. However, I don't think that working on the rest above MPI  
>> will save us any effort or time.
>>
>> Greg Bronevetsky
>> Post-Doctoral Researcher
>> 1028 Building 451
>> Lawrence Livermore National Lab
>> (925) 424-5756
>> bronevetsky1 at llnl.gov
>>
>> At 01:17 PM 2/11/2009, Josh Hursey wrote:
>>> In our meeting yesterday, I was sitting in the back trying to take  
>>> in
>>> the complexity of communicator recreation. It seems that much of the
>>> confusion at the moment is that we (at least I) are still not  
>>> exactly
>>> sure how the interface should be defined and implemented.
>>>
>>> I think of the process fault tolerance specification as a series of
>>> steps that can be individually specified building upon each step  
>>> while
>>> working towards a specific goal set. From this I was asking  
>>> myself, is
>>> there any foundational concepts that we can define now so that folks
>>> can start implementation.
>>>
>>> That being said I suggest that we consider FT-MPI's model of all
>>> communicators except the base 3 (COMM_WORLD, COMM_SELF, COMM_NULL)  
>>> are
>>> destroyed on a failure as the starting point for implementation.  
>>> This
>>> would get us started. We can continue to pursue communicator
>>> reconstruction interfaces though a virtualization later above MPI.  
>>> We
>>> can use this layer to experiment with the communicator recreation
>>> mechanisms in conjunction with applications while pursing the first
>>> step implementation. Once we start to agree on the interface for
>>> communicator reconstruction, then we can start to push it into the  
>>> MPI
>>> standard/library for a better standard/implementation.
>>>
>>> The communicator virtualization library is a staging area for these
>>> interface ideas that we seem to be struggling with. The  
>>> virtualization
>>> could be a simple table lookup that matches the Application's  
>>> Virtual
>>> Communicator Object to the actual MPI Communicator Object that may
>>> have been recreated for you by the virtualization library.
>>>
>>> We should still spend time on talking though usage scenarios for
>>> communicator recreation, since that will eventually be something  
>>> that
>>> we want to provide to the application. I'm just suggesting that we
>>> specify the first step so we can start experimenting with the
>>> communicator recreation next step.
>>>
>>> What do people think about this as a step forward?
>>>
>>> Best,
>>> Josh
>>> _______________________________________________
>>> mpi3-ft mailing list
>>> mpi3-ft at lists.mpi-forum.org
>>> http:// lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>
>> _______________________________________________
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft




More information about the mpiwg-ft mailing list