[Mpi3-ft] Exit Code from 'mpirun' upon failure recovery

Darius Buntinas buntinas at mcs.anl.gov
Fri Feb 4 10:12:00 CST 2011


What does the standard suggest for the case when different processes return different return codes?  Can't we use the same approach?

-d

On Feb 4, 2011, at 8:19 AM, Joshua Hursey wrote:

> So this is not really appropriate for the MPI standard language, but more of a user experience question. In fact this is a much larger question that implementations have to struggle with already.
> 
> If a process fails in the application (either by external causes or by calling MPI_Abort), what should 'mpirun' return as its exit status?
> 
> If the application intends to handle the failure and continue running after recovering then they may expect that as long as MPI_Finalize is called in all remaining processes that 'mpirun' return '0' or success. But if no process calls MPI_Finalize (because they either called MPI_Abort or terminated abnormally) that it return a non-zero value - probably one of the values that they set in MPI_Abort, if possible. Of course there is the case where the failure occurs during MPI_Finalize, to which the MPI implementation may or may not be able to act consistently depending on the timing of the failure notification.
> 
> I was mucking around in this code in the Open MPI prototype, and thought I would get the opinions of the group as I move forward.
> 
> Thanks,
> Josh
> 
> ------------------------------------
> Joshua Hursey
> Postdoctoral Research Associate
> Oak Ridge National Laboratory
> http://users.nccs.gov/~jjhursey
> 
> 
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft





More information about the mpiwg-ft mailing list