[Mpi3-ft] Communicator Virtualization as a step forward
Josh Hursey
jjhursey at open-mpi.org
Wed Feb 11 15:17:14 CST 2009
In our meeting yesterday, I was sitting in the back trying to take in
the complexity of communicator recreation. It seems that much of the
confusion at the moment is that we (at least I) are still not exactly
sure how the interface should be defined and implemented.
I think of the process fault tolerance specification as a series of
steps that can be individually specified building upon each step while
working towards a specific goal set. From this I was asking myself, is
there any foundational concepts that we can define now so that folks
can start implementation.
That being said I suggest that we consider FT-MPI's model of all
communicators except the base 3 (COMM_WORLD, COMM_SELF, COMM_NULL) are
destroyed on a failure as the starting point for implementation. This
would get us started. We can continue to pursue communicator
reconstruction interfaces though a virtualization later above MPI. We
can use this layer to experiment with the communicator recreation
mechanisms in conjunction with applications while pursing the first
step implementation. Once we start to agree on the interface for
communicator reconstruction, then we can start to push it into the MPI
standard/library for a better standard/implementation.
The communicator virtualization library is a staging area for these
interface ideas that we seem to be struggling with. The virtualization
could be a simple table lookup that matches the Application's Virtual
Communicator Object to the actual MPI Communicator Object that may
have been recreated for you by the virtualization library.
We should still spend time on talking though usage scenarios for
communicator recreation, since that will eventually be something that
we want to provide to the application. I'm just suggesting that we
specify the first step so we can start experimenting with the
communicator recreation next step.
What do people think about this as a step forward?
Best,
Josh
More information about the mpiwg-ft
mailing list