[Mpi3-ft] system-level C/R requirements
Greg Bronevetsky
bronevetsky1 at llnl.gov
Fri Oct 24 15:20:51 CDT 2008
>Agreed -- specifiying an explicit list of platforms or OS or even
>resource specifics is not the way to go in a standard.
>
>My suggestion would be to explore if we can define abstract,
>higher-level resources to define a "state", and specify high-level
>actions. For instance, pinning/unpinning memory is very specific to
>RDMA, but maybe a "disconnect virtual connection" operation may
>abstract it. But this puts us into the realm of virtualizing MPI
>internal components/concepts ..
>
>Maybe there is a more elegant way ...
The thing that worries me is that an MPI implementation may have a
fair amount of state sitting on the network card. This state is
unreachable by a user- or kernel-level checkpointer but may be
reachable by a VMM-level checkpointer. How do we differentiate the
level at which we're working? System-level checkpointers working at
different levels need MPI state to be flushed to different levels of
abstraction and it seems that we'll need to be very low-level in
order to define what it means to operate at a given level of abstraction.
Greg Bronevetsky
Post-Doctoral Researcher
1028 Building 451
Lawrence Livermore National Lab
(925) 424-5756
bronevetsky1 at llnl.gov
More information about the mpiwg-ft
mailing list