<html><body>

<p>Jeff asked why it might be problematic to make MPI_Abort or MPI_Quit into a reliable (as opposed to "best effort") job termination. He mentioned the assumption that any parallel job must be under control of some supervisor that can clean up.<br>

<br>

This assumption may actually identify one root of a problem.  If the supervisor has lost its full connectivity but the tasks of a job are still able to communicate then MPI_Finalize might still be able to promise clean termination. A call to MPI_Abort or MPI_Quit may be unable to do anything about tasks that are outside the reach of its own subset of the broken supervisor. <br>

<br>

Also, any idea of clean termination by either MPI_Abort or MPI_Quit becomes very messy if you consider MPI-IO (or any cooperative file writing).  If some tasks are working together on an MPI_File_write_xxxx and some task throws the ABORT bomb (or QUIT bomb), it is probably not feasible to make any promise about the state of the file.<br>

<br>

This again brings me back to thinking perhaps the MPI standard cannot offer any decent semantic for a single task making a decision to terminate with success and the people who want this for their applications cannot have it.<br>

<br>

<br>

Dick Treumann  -  MPI Team           <br>

IBM Systems & Technology Group<br>

Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601<br>

Tele (845) 433-7846         Fax (845) 433-8363<br>

</body></html>