[Mpi3-ft] Exit Code from 'mpirun' upon failure recovery
buntinas at mcs.anl.gov
Fri Feb 4 11:36:59 CST 2011
Here's a first stab:
Without faults, it would be reasonable for mpiexec to return the exit code of rank 0. If we extend this to the case with faults and fault tolerant programs, we could say that mpiexec returns the return code from the lowest ranked process that exits after calling MPI_Finalize. If there is no such process, the return code specified in the last call to MPI_Abort(). If no process calls MPI_Finalize or MPI_Abort, return the return code of rank 0.
This way if the process chooses to handle faults and continues the non-failed processes will eventually call MPI_Finalize. These processes can then decide what return code to return, since we have to choose one, let's choose the lowest ranked one.
MPI_Abort can specify a communicator other than MPI_COMM_WORLD, and a high-quality implementation would abort only the processes in that communicator, leaving the remaining processes to continue. In this case we don't want to save the return code of the MPI_Abort, since the app is still running. If these processes eventually call MPI_Finalize, that's covered by the first case. But if these processes run into another problem and call MPI_Abort, we should return the return code from that MPI_Abort. So only the last MPI_Abort should count. Of course 'last' may be subject to some nondeterminism, but that should be OK.
The last case is a catch-all where processes just die. So we might as well return the return code of rank 0.
What do you think? Of course this would be 'advice to implementors' and not required.
On Feb 4, 2011, at 10:50 AM, Joshua Hursey wrote:
> The standard is pretty cagey about this issue. The only place where I see it referenced is after MPI_Abort() where it says:
> Advice to users. Whether the errorcode is returned from the executable or from the MPI process startup mechanism (e.g., mpiexec), is an aspect of quality of the MPI library but not mandatory. (End of advice to users.)
> Advice to implementors. Where possible, a high-quality implementation will try to return the errorcode from the MPI process startup mechanism (e.g. mpiexec or singleton init). (End of advice to implementors.)
> So it does not say what happens if MPI_Abort is called from multiple processes each with different errorcodes, or if the MPI implementation chooses to continue with the application and it completes normally later.
> Since the standard is reluctant to provide us guidance, I suspect that any decision we make is not appropriate for explicit standardization -- Maybe it can be an additional 'Advice to implementors'. I was mostly trying to see if anyone had strong feelings on the expected behavior in the various fault tolerant scenarios. In particular, I know that some applications check for non-zero return code from 'mpirun' as one indicator that the job did not complete successfully, and determine if they should take recovery steps after job completion.
> -- Josh
> On Feb 4, 2011, at 11:12 AM, Darius Buntinas wrote:
>> What does the standard suggest for the case when different processes return different return codes? Can't we use the same approach?
>> On Feb 4, 2011, at 8:19 AM, Joshua Hursey wrote:
>>> So this is not really appropriate for the MPI standard language, but more of a user experience question. In fact this is a much larger question that implementations have to struggle with already.
>>> If a process fails in the application (either by external causes or by calling MPI_Abort), what should 'mpirun' return as its exit status?
>>> If the application intends to handle the failure and continue running after recovering then they may expect that as long as MPI_Finalize is called in all remaining processes that 'mpirun' return '0' or success. But if no process calls MPI_Finalize (because they either called MPI_Abort or terminated abnormally) that it return a non-zero value - probably one of the values that they set in MPI_Abort, if possible. Of course there is the case where the failure occurs during MPI_Finalize, to which the MPI implementation may or may not be able to act consistently depending on the timing of the failure notification.
>>> I was mucking around in this code in the Open MPI prototype, and thought I would get the opinions of the group as I move forward.
>>> Joshua Hursey
>>> Postdoctoral Research Associate
>>> Oak Ridge National Laboratory
>>> mpi3-ft mailing list
>>> mpi3-ft at lists.mpi-forum.org
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
> Joshua Hursey
> Postdoctoral Research Associate
> Oak Ridge National Laboratory
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
More information about the mpiwg-ft