[Mpi3-ft] system-level C/R requirements

Greg Bronevetsky bronevetsky1 at llnl.gov
Fri Oct 24 15:20:51 CDT 2008

>Agreed -- specifiying an explicit list of platforms or OS or even 
>resource specifics is not the way to go in a standard.
>My suggestion would be to explore if we can define abstract, 
>higher-level resources to define a "state", and specify high-level 
>actions. For instance, pinning/unpinning memory is very specific to 
>RDMA, but maybe a "disconnect virtual connection" operation may 
>abstract it. But this puts us into the realm of virtualizing MPI 
>internal components/concepts ..
>Maybe there is a more elegant way ...
The thing that worries me is that an MPI implementation may have a 
fair amount of state sitting on the network card. This state is 
unreachable by a user- or kernel-level checkpointer but may be 
reachable by a VMM-level checkpointer. How do we differentiate the 
level at which we're working? System-level checkpointers working at 
different levels need MPI state to be flushed to different levels of 
abstraction and it seems that we'll need to be very low-level in 
order to define what it means to operate at a given level of abstraction.

Greg Bronevetsky
Post-Doctoral Researcher
1028 Building 451
Lawrence Livermore National Lab
(925) 424-5756
bronevetsky1 at llnl.gov 

More information about the mpiwg-ft mailing list