[Mpi3-ft] system-level C/R requirements

Narasimhan, Kannan kannan.narasimhan at hp.com
Fri Oct 24 15:11:53 CDT 2008

[Changing the title to track discussion]

Agreed -- specifiying an explicit list of platforms or OS or even resource specifics is not the way to go in a standard.

My suggestion would be to explore if we can define abstract, higher-level resources to define a "state", and specify high-level actions. For instance, pinning/unpinning memory is very specific to RDMA, but maybe a "disconnect virtual connection" operation may abstract it. But this puts us into the realm of virtualizing MPI internal components/concepts ..

Maybe there is a more elegant way ...


-----Original Message-----
From: mpi3-ft-bounces at lists.mpi-forum.org [mailto:mpi3-ft-bounces at lists.mpi-forum.org] On Behalf Of Greg Bronevetsky
Sent: Friday, October 24, 2008 12:46 PM
To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group; MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
Subject: Re: [Mpi3-ft] Summary of today's meeting

>When I look at the restore requirements on MPI as described below, they
>seem quit extensive. Including re-pining and opening any previous
>opened communication handles.
I agree. Furthermore, its not just the length of the list but also the fact that it is very sensitive to platform-specific details. If we have any hope of providing this API, we'll need a good survey of what would be required on the full range of possible target platforms, ranging from BGL CNK/Catamount, to Windows/Unix to Symbian, even if there currently are no MPI implementations on a given possible platform. With the regular MPI spec we don't need to do anything so thorough because MPI is high-level enough that it is reasonable to assume that an implementation of some sort can be written for any platform. However, here we're talking about such low-level details that getting them wrong in the spec would mean that implementations on some platforms would actually be impossible. We could put in an explicit list of OSs that the quiescence API applies to but I don't think that'd fly with the forum.

Greg Bronevetsky
Post-Doctoral Researcher
1028 Building 451
Lawrence Livermore National Lab
(925) 424-5756
bronevetsky1 at llnl.gov

mpi3-ft mailing list
mpi3-ft at lists.mpi-forum.org

More information about the mpiwg-ft mailing list