[Mpi-forum] MPI_Abort - meaning

Richard Treumann treumann at us.ibm.com
Tue Apr 6 15:51:55 CDT 2010


Jeff asked why it might be problematic to make MPI_Abort or MPI_Quit into a
reliable (as opposed to "best effort") job termination. He mentioned the
assumption that any parallel job must be under control of some supervisor
that can clean up.

This assumption may actually identify one root of a problem.  If the
supervisor has lost its full connectivity but the tasks of a job are still
able to communicate then MPI_Finalize might still be able to promise clean
termination. A call to MPI_Abort or MPI_Quit may be unable to do anything
about tasks that are outside the reach of its own subset of the broken
supervisor.

Also, any idea of clean termination by either MPI_Abort or MPI_Quit becomes
very messy if you consider MPI-IO (or any cooperative file writing).  If
some tasks are working together on an MPI_File_write_xxxx and some task
throws the ABORT bomb (or QUIT bomb), it is probably not feasible to make
any promise about the state of the file.

This again brings me back to thinking perhaps the MPI standard cannot offer
any decent semantic for a single task making a decision to terminate with
success and the people who want this for their applications cannot have it.


Dick Treumann  -  MPI Team
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846         Fax (845) 433-8363
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpi-forum/attachments/20100406/21501802/attachment-0001.html>


More information about the mpi-forum mailing list