[Mpi3-ft] General Network Channel Failure

Kathryn Mohror kathryn at llnl.gov
Sun Jun 12 11:09:54 CDT 2011

Hi all,

(I sent this earlier, but I don't think it went through because I sent 
it from the wrong email address. I apologize if you get an extra copy.)

I know I haven't participated in this working group yet, so I may be
missing some context, but I couldn't resist putting my two cents in!

I think that an MPI-centric approach is best. Otherwise, you run the
risk of defining a model that doesn't fit with a particular
implementation or machine and get shot down when it's brought to the
forum. For example, you may remember the PERUSE performance interface
that assumed a model of MPI that implementers didn't approve, because it 
didn't fit their implementation or was difficult/expensive to support. 
Now, to replace PERUSE, we've got the MPI_T interface which doesn't 
specify *anything* but appears to be supported by the forum.

I agree though that having more specific error information when it's
available would be very useful. You might consider taking an approach
similar to MPI_T -- allow MPI implementers to define any specific error
codes they can/want and then provide an interface for decoding and
interpreting the errors.

Of course, this approach may not be useful for most applications
directly, but I imagine that a fault-tolerant MPI application or  a
checkpoint/restart library could make use of the information, assuming
they could get at it.


On 6/9/2011 8:20 AM, Howard Pritchard wrote:
> Hi Greg,
> I vote for an MPI-centric model.
> I also think that part of the job of MPI is to hide as much
> as possible things like 'exhaustion of network resources'
> and 'intermittent network failures'.  Indeed, the very first
> sentence in section 2.8 says "MPI provides the user with
> reliable message transmission".
> The only reason the topic came up yesterday was in the
> context of the fail-stop model and what types of error
> codes might be returned by MPI before the official
> verdict was in that a fail-stop had occurred.  Several of
> us checked what our implementations might do prior to
> that, and it could include returning MPI_ERR_OTHER.  I
> could see how for someone writing a fault tolerant MPI
> application, something more useful than this rather ambiguous
> error code might be worth defining.
> Howard
> Bronevetsky, Greg wrote:
>> I like the idea of having an abstract model of failures that can approximate changes in system functionality due to failures. However, I think before we go too far with this we should consider the type of model we want to make. One option is to make a system model that has as its basic elements nodes, network links and other hardware components and identifies points in time when stop functioning. The other option is to make it MPI-centric by talking about the status of ranks and point-to-point communication between them as well as communicators and collective communication over them. So in the first type of model we can talk about network resource exhaustion and in the latter we can talk about an intermittent inability to send messages over some or all communicators.
>> I think that the MPI-centric model is a better option since it talks exclusively  about entities that exist in MPI and ignores the physical phenomena that cause a given type of degradation in functionality.
>> The other question we need to discuss is the types of problems we want to represent. We obviously care about fail-stop failures but we're not talking about resource exhaustion. Do we want to add error classes for transient errors and if so, what about performance slowdowns?
>> Greg Bronevetsky
>> Lawrence Livermore National Lab
>> (925) 424-5756
>> bronevetsky at llnl.gov
>> http://greg.bronevetsky.com
>>> -----Original Message-----
>>> From: mpi3-ft-bounces at lists.mpi-forum.org [mailto:mpi3-ft-
>>> bounces at lists.mpi-forum.org] On Behalf Of Josh Hursey
>>> Sent: Wednesday, June 08, 2011 11:36 AM
>>> To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
>>> Subject: [Mpi3-ft] General Network Channel Failure
>>> It was mentioned in the conversation today that MPI_ERR_RANK_FAIL_STOP
>>> may not be the first error returned by an MPI call. In particular the MPI call
>>> may return an error symptomatic of a fail-stop process failure (e.g., network
>>> link failed - currently MPI_ERR_OTHER), before eventually diagnosing the
>>> event as a process failure. This 'space between' MPI_SUCCESS behavior and
>>> MPI_ERR_RANK_FAIL_STOP behavior is not currently defined, and probably
>>> should be for the application to cleanly move from set of semantics for one
>>> error class to another.
>>> The suggestion was to create a new general network error class (e.g.,
>>> taken) that can be returned when the operation cannot complete due to
>>> network issues (which might be later diagnosed as process failure and
>>> escalated to the MPI_ERR_RANK_FAIL_STOP semantics). You could also think
>>> about this error being used for network resource exhaustion as well
>>> (something that Tony mentioned during the last MPI Forum meeting). In
>>> which case retrying at a later time or taking some other action before trying
>>> again would be useful/expected.
>>> There are some issues with matching, and the implications on collective
>>> operations. If the network error is sticky/permanent then once the error is
>>> returned it will always be returned or escalated to fail-stop process failure (or
>>> more generally to a 'higher/more severe/more detailed' error class). A
>>> recovery proposal (similar to what we are developing for process failure)
>>> would allow the application to 'recover' the channel and continue
>>> communicating on it.
>>> The feeling was that this should be expanded into a full proposal, separate
>>> from the Run-Through Stabilization proposal. So we can continue with the
>>> RTS proposal, and bring this forward when it is ready.
>>> What to folks think about this idea?
>>> -- Josh
>>> --
>>> Joshua Hursey
>>> Postdoctoral Research Associate
>>> Oak Ridge National Laboratory
>>> http://users.nccs.gov/~jjhursey
>>> _______________________________________________
>>> mpi3-ft mailing list
>>> mpi3-ft at lists.mpi-forum.org
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>> _______________________________________________
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft

More information about the mpiwg-ft mailing list