<div dir="ltr">Hi FT WG,<div><br></div><div>I am doing my best to socialize the FT proposal at Intel and gathered a piece of feedback to bring back to the WG.</div><div><br></div><div style>There was a concern that any time the user registers an error handler, fault tolerance could be "switched on" because MPI_Comm_set_errhandler() does not distinguish between error classes. The assumption was that, when switched on, there would be space/time costs associated with fault tolerance. How does the current proposal determine when fault tolerance should be enabled?</div>
<div style><br></div><div style>One suggested mechanism was to add a function, MPI_Comm_set_faulthandler() that allows the programmer to distinguish between errors and failures. This would allow the runtime to determine when fault tolerance was desired. I think the way this is handled currently is to rely on the implementation switching on/off fault tolerance when the job is launched.</div>
<div style><br></div><div style> ~Jim.</div></div>