[Mpi3-ft] Communicator Virtualization as a step forward

Greg Bronevetsky bronevetsky1 at llnl.gov
Fri Feb 13 10:50:55 CST 2009


>FT must be optional to be accepted. One way is to put it into a 
>subset, say, thru the MPI_INIT_ASSERTED proposal. Another is to add 
>yet one MPI_INIT call that will contain a flag for FT configuration 
>(like MPI_FT_SYNCHRONOUS, MPI_FT_ASYNCHRONOUS, etc.). This was 
>mentioned in relation to the issue notification in the earlier fault 
>handling=error handling proposal of mine, see https:// 
>svn.mpi-forum.org/trac/mpi-forum-web/wiki/Fault%20Handling.

I agree that FT must be optional but I don't think that we need to 
add anything to the proposal to make this happen. The proposal 
provides an API that allows the MPI implementation to tell the 
application about detected but recoverable failures and help it 
perform recovery. It does not say anything about which failures must 
be recoverable for MPI. Reliable MPI implementations will be able to 
do much more than unreliable MPI implementations. Users who need 
reliability will choose the former while others will choose the 
latter. The same will apply for things like network degradation. 
Since the spec will never talk about what types of physical events 
must be reportable by MPI, individual implementations will be able to 
trade off efficiency against the usefulness of system monitoring and 
all such choices will be compliant to the spec.

Greg Bronevetsky
Post-Doctoral Researcher
1028 Building 451
Lawrence Livermore National Lab
(925) 424-5756
bronevetsky1 at llnl.gov 




More information about the mpiwg-ft mailing list