[Mpi3-ft] Exit Code from 'mpirun' upon failure recovery

Joshua Hursey jjhursey at open-mpi.org
Fri Feb 4 08:19:00 CST 2011


So this is not really appropriate for the MPI standard language, but more of a user experience question. In fact this is a much larger question that implementations have to struggle with already.

If a process fails in the application (either by external causes or by calling MPI_Abort), what should 'mpirun' return as its exit status?

If the application intends to handle the failure and continue running after recovering then they may expect that as long as MPI_Finalize is called in all remaining processes that 'mpirun' return '0' or success. But if no process calls MPI_Finalize (because they either called MPI_Abort or terminated abnormally) that it return a non-zero value - probably one of the values that they set in MPI_Abort, if possible. Of course there is the case where the failure occurs during MPI_Finalize, to which the MPI implementation may or may not be able to act consistently depending on the timing of the failure notification.

I was mucking around in this code in the Open MPI prototype, and thought I would get the opinions of the group as I move forward.

Thanks,
Josh

------------------------------------
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey





More information about the mpiwg-ft mailing list