[mpiwg-ft] MPI_Comm_call_errhandler

George Bosilca bosilca at icl.utk.edu
Sun Nov 23 17:12:38 CST 2014


I am not sure on what you base you understanding regarding the different signature for MPI_ERROR_RETURNS. I can’t find anything suggesting this in the standard.

My understanding is that MPI_COMM_CALL_ERRHANDLER is supposed to return MPI_SUCCESS if the error handler has been successfully called, as indicated in the following snippet from the standard.

> This function invokes the error handler assigned to the communicator with the error code supplied. This function returns MPI_SUCCESS in C and the same value in IERROR if the error handler was successfully called (assuming the process is not aborted and the error handler returns).

Additionally there is no reason for a user defined error handler to return an error code, it should only react to the error it is informed about.

We discussed a similar issue in the WG a while back. The errhandler receives the error as a reference, fact that seems to indicate that the error handle might be allowed to change the error. Unfortunately, this is very unsettling as we can’t allow a user-defined error handler to replace a legit MPI error with MPI_SUCCESS, which basically returns MPI_SUCCESS to the upper layer when the MPI library might be in a unusable state.


> On Nov 19, 2014, at 19:20 , Jim Dinan <james.dinan at gmail.com> wrote:
> Hi FT Folks,
> I encountered this oddity while doing some hacking today.  Apparently, MPI_ERRORS_RETURN has a different signature from all other error handlers; it returns a value and all other error handlers (e.g. MPI_ERRORS_ARE_FATAL and user-defined error handlers) have void return:
> typedef void MPI_Comm_errhandler_function(MPI_Comm *, int *, ...);
> Because of this, MPI_Comm_call_errhandler effectively does not do anything when the error handler is set to MPI_ERRORS_RETURN.  More specifically, the following code,
> return MPI_Comm_call_errhandler(my_comm, my_err_code);
> always returns success.
> To summarize the issues: (1) It is difficult or impossible to use the MPI error subsystem when the error handler is set to MPI_ERRORS_RETURN and (2) it is impossible for a user defined error handler to return an error code.  These difficulties seem like they could be troublesome to users of FT that want to create higher level resilience/resilient libraries that interact with the MPI errors subsystem.
> Is what I have encountered a real issue or am I misunderstanding something?  If it is a real issue, is it something we queue up for discussion at the upcoming F2F?
> Interested to hear your thoughts,
>  ~Jim.
> _______________________________________________
> mpiwg-ft mailing list
> mpiwg-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-ft

More information about the mpiwg-ft mailing list