[Mpi3-ft] Communicator Virtualization as a step forward

Wed Feb 11 16:57:46 CST 2009

I don't understand what you mean by "We can continue to pursue 
communicator reconstruction interfaces though a virtualization later 
above MPI."  To me it seems that such interfaces will effectively 
need to implement communicators on top of MPI in order be 
operational, which will take about as much effort as implementing 
them inside MPI. In particular, I don't see a way to recreate a 
communicator using the MPI interface without making collective calls. 
However, we're defining MPI_Rejoin (or whatever its called) to be a 
local operation. This means that we cannot use the MPI communicators 
interface and must instead implement our own communicators.

The bottom line is that it does make sense to start implementing 
support for the FT-MPI model and evolve that to a more elaborate 
model. However, I don't think that working on the rest above MPI will 
save us any effort or time.

Greg Bronevetsky
Post-Doctoral Researcher
1028 Building 451
Lawrence Livermore National Lab
(925) 424-5756
bronevetsky1 at llnl.gov

At 01:17 PM 2/11/2009, Josh Hursey wrote:
>In our meeting yesterday, I was sitting in the back trying to take in
>the complexity of communicator recreation. It seems that much of the
>confusion at the moment is that we (at least I) are still not exactly
>sure how the interface should be defined and implemented.
>
>I think of the process fault tolerance specification as a series of
>steps that can be individually specified building upon each step while
>working towards a specific goal set. From this I was asking myself, is
>there any foundational concepts that we can define now so that folks
>can start implementation.
>
>That being said I suggest that we consider FT-MPI's model of all
>communicators except the base 3 (COMM_WORLD, COMM_SELF, COMM_NULL) are
>destroyed on a failure as the starting point for implementation. This
>would get us started. We can continue to pursue communicator
>reconstruction interfaces though a virtualization later above MPI. We
>can use this layer to experiment with the communicator recreation
>mechanisms in conjunction with applications while pursing the first
>step implementation. Once we start to agree on the interface for
>communicator reconstruction, then we can start to push it into the MPI
>standard/library for a better standard/implementation.
>
>The communicator virtualization library is a staging area for these
>interface ideas that we seem to be struggling with. The virtualization
>could be a simple table lookup that matches the Application's Virtual
>Communicator Object to the actual MPI Communicator Object that may
>have been recreated for you by the virtualization library.
>
>We should still spend time on talking though usage scenarios for
>communicator recreation, since that will eventually be something that
>we want to provide to the application. I'm just suggesting that we
>specify the first step so we can start experimenting with the
>communicator recreation next step.
>
>What do people think about this as a step forward?
>
>Best,
>Josh
>_______________________________________________
>mpi3-ft mailing list
>mpi3-ft at lists.mpi-forum.org
>http:// lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft