[Mpi3-ft] Communicator Virtualization as a step forward

Supalov, Alexander alexander.supalov at intel.com
Fri Feb 13 11:12:51 CST 2009


Hi,

Thanks. It appears to me that this goes a step farther than the spawning/attachment. That one was supposed to work everywhere, but very few implementation were available until very recently, and hence very few applications use this feature.

Now, with FT optional in the sense that you described, who's going to develop an application that may be able to take advantage of fairly unspecified capability to detect unknown number of faults? I can't imagine why this would happen.

Also, imagine one got used to a certain set of provided FT features. Going to another MPI implementation they may see another set of features, up to and including none at all. Why would they experiment? This locks users into one MPI implementation. No good.

So, I'd say that unless we can guarantee some basic functionality when FT is switched on that way or another, we're not going to create much excitement in the user community. And life's too short to work on something that nobody will be excited about.

It may be better to propose a way to control FT as a subset or what not, and then do guarantee that if you turn this capability on, you get as a minimum certain benefits by agreeing to certain tradeoff on performance.

That may be out of the scope of this particular WG, but this is something that this WG should very seriously consider in order for the outcome to fly. Or so I think.

Best regards.

Alexander

-----Original Message-----
From: mpi3-ft-bounces at lists.mpi-forum.org [mailto:mpi3-ft-bounces at lists.mpi-forum.org] On Behalf Of Greg Bronevetsky
Sent: Friday, February 13, 2009 5:51 PM
To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group; MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
Subject: Re: [Mpi3-ft] Communicator Virtualization as a step forward


>FT must be optional to be accepted. One way is to put it into a 
>subset, say, thru the MPI_INIT_ASSERTED proposal. Another is to add 
>yet one MPI_INIT call that will contain a flag for FT configuration 
>(like MPI_FT_SYNCHRONOUS, MPI_FT_ASYNCHRONOUS, etc.). This was 
>mentioned in relation to the issue notification in the earlier fault 
>handling=error handling proposal of mine, see https:// 
>svn.mpi-forum.org/trac/mpi-forum-web/wiki/Fault%20Handling.

I agree that FT must be optional but I don't think that we need to 
add anything to the proposal to make this happen. The proposal 
provides an API that allows the MPI implementation to tell the 
application about detected but recoverable failures and help it 
perform recovery. It does not say anything about which failures must 
be recoverable for MPI. Reliable MPI implementations will be able to 
do much more than unreliable MPI implementations. Users who need 
reliability will choose the former while others will choose the 
latter. The same will apply for things like network degradation. 
Since the spec will never talk about what types of physical events 
must be reportable by MPI, individual implementations will be able to 
trade off efficiency against the usefulness of system monitoring and 
all such choices will be compliant to the spec.

Greg Bronevetsky
Post-Doctoral Researcher
1028 Building 451
Lawrence Livermore National Lab
(925) 424-5756
bronevetsky1 at llnl.gov 

_______________________________________________
mpi3-ft mailing list
mpi3-ft at lists.mpi-forum.org
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.





More information about the mpiwg-ft mailing list