[Mpi3-ft] Communicator Virtualization as a step forward

Josh Hursey jjhursey at open-mpi.org
Wed Feb 11 15:17:14 CST 2009


In our meeting yesterday, I was sitting in the back trying to take in  
the complexity of communicator recreation. It seems that much of the  
confusion at the moment is that we (at least I) are still not exactly  
sure how the interface should be defined and implemented.

I think of the process fault tolerance specification as a series of  
steps that can be individually specified building upon each step while  
working towards a specific goal set. From this I was asking myself, is  
there any foundational concepts that we can define now so that folks  
can start implementation.

That being said I suggest that we consider FT-MPI's model of all  
communicators except the base 3 (COMM_WORLD, COMM_SELF, COMM_NULL) are  
destroyed on a failure as the starting point for implementation. This  
would get us started. We can continue to pursue communicator  
reconstruction interfaces though a virtualization later above MPI. We  
can use this layer to experiment with the communicator recreation  
mechanisms in conjunction with applications while pursing the first  
step implementation. Once we start to agree on the interface for  
communicator reconstruction, then we can start to push it into the MPI  
standard/library for a better standard/implementation.

The communicator virtualization library is a staging area for these  
interface ideas that we seem to be struggling with. The virtualization  
could be a simple table lookup that matches the Application's Virtual  
Communicator Object to the actual MPI Communicator Object that may  
have been recreated for you by the virtualization library.

We should still spend time on talking though usage scenarios for  
communicator recreation, since that will eventually be something that  
we want to provide to the application. I'm just suggesting that we  
specify the first step so we can start experimenting with the  
communicator recreation next step.

What do people think about this as a step forward?

Best,
Josh



More information about the mpiwg-ft mailing list