[Mpi-forum] MPI_Abort - meaning

Mon Apr 5 11:44:51 CDT 2010

On Apr 5 2010, Jeff Hammond wrote:
> On Mon, Apr 5, 2010 at 10:18 AM, Richard Treumann <treumann at us.ibm.com> 
> wrote:
>> It has come to my attention that there is at least one MPI user who 
>> considers an MPI_Abort call to be a legitimate way to terminate an 
>> application that has reached a correct result. I gather the situation is 
>> that as soon as any task has a result it decides is satisfactory, 
>> whatever the other tasks are working on becomes instantly irrelevant.
>>
>> Is this a situation other members of the Forum consider as common or at 
>> least legitimate? I can see the approach as legitimate but question the 
>> use of MPI_Abort.

The approach is reasonable; using MPI_Abort for that is demented.

>> Would you regard it as legitimate to consider an application that ended 
>> with a call to MPI_Abort to be successful? If so, should we say 
>> something explicit in the MPI_Abort description?

With implementation and portability hats on, No Way, Jose!

>> Should there be something new like MPI_Quit which is defined as a 
>> "correct" single task termination of an MPI application?

Grrk.  It could be done, but the consequences are non-trivial.  For
example, consider an implementation of MPI on a system like zOS.  Or
one that uses "System V shared memory" for communication.

>> We currently say that MPI_Abort makes a "best attempt" which to me 
>> implies it is not guaranteed by the standard that it will really leave 
>> the system as good as new. I am not aware of anybody having a problem 
>> making MPI_Abort a total termination of processes and recovery of 
>> resources but the escape hatch is there for an MPI implementation that 
>> was unable to do a perfectly clean Abort.

That's because the users report it to people like me, who tell them
it's unavoidable, according to specification, and to stop using MPI_Abort
that way.  So the failures don't get reported to people like you.  And,
yes, I have had poe fail that way, as well as most other MPIs I have
used or supported heavily.

>That's actually a pretty good idea for a number of applications.  I
>wouldn't feel comfortable using a function called MPI_Abort for this,
>but since MPI_Finalize is collective, any alternative is going to have
>some unnecessary overhead.  MPI needs either a new function call or
>modification of MPI_Finalize to allow for asynchronous termination.

I fully agree.

>Can we have MPI_Quit take various arguments specifying termination
>behavior such as NO_WAIT (a cleaner single-rank MPI_Abort) and
>possibly other options which make sense in the context of MPI
>endpoints and other threading uses?  There might be times when one
>would want to sync up the node or a subset of the processes before
>terminating.

Yes, but that is related to another deficiency of MPI (in the sense
of a missing facility, not a fault).  There isn't any way to enquire
what the state of the local message queues are, even though it would
be possible to specify cleanly.  Without that, one would have to
spoecify what synchronising means fairly carefully.  It should be
feasible.

>If anyone thinks this usage of MPI_Abort is crazy, I'll pseudo-code up
>something that would require it to perform efficiently.

Never mind the efficiency - let's see something that is semantically
well-defined and feasible to implement on any system that current MPI
supports!

Regards,
Nick Maclaren.