[mpiwg-ft] MPI_Comm_call_errhandler
George Bosilca
bosilca at icl.utk.edu
Sun Nov 23 17:12:38 CST 2014
Jim,
I am not sure on what you base you understanding regarding the different signature for MPI_ERROR_RETURNS. I can’t find anything suggesting this in the standard.
My understanding is that MPI_COMM_CALL_ERRHANDLER is supposed to return MPI_SUCCESS if the error handler has been successfully called, as indicated in the following snippet from the standard.
> This function invokes the error handler assigned to the communicator with the error code supplied. This function returns MPI_SUCCESS in C and the same value in IERROR if the error handler was successfully called (assuming the process is not aborted and the error handler returns).
Additionally there is no reason for a user defined error handler to return an error code, it should only react to the error it is informed about.
We discussed a similar issue in the WG a while back. The errhandler receives the error as a reference, fact that seems to indicate that the error handle might be allowed to change the error. Unfortunately, this is very unsettling as we can’t allow a user-defined error handler to replace a legit MPI error with MPI_SUCCESS, which basically returns MPI_SUCCESS to the upper layer when the MPI library might be in a unusable state.
George.
> On Nov 19, 2014, at 19:20 , Jim Dinan <james.dinan at gmail.com> wrote:
>
> Hi FT Folks,
>
> I encountered this oddity while doing some hacking today. Apparently, MPI_ERRORS_RETURN has a different signature from all other error handlers; it returns a value and all other error handlers (e.g. MPI_ERRORS_ARE_FATAL and user-defined error handlers) have void return:
>
> typedef void MPI_Comm_errhandler_function(MPI_Comm *, int *, ...);
>
> Because of this, MPI_Comm_call_errhandler effectively does not do anything when the error handler is set to MPI_ERRORS_RETURN. More specifically, the following code,
>
> return MPI_Comm_call_errhandler(my_comm, my_err_code);
>
> always returns success.
>
> To summarize the issues: (1) It is difficult or impossible to use the MPI error subsystem when the error handler is set to MPI_ERRORS_RETURN and (2) it is impossible for a user defined error handler to return an error code. These difficulties seem like they could be troublesome to users of FT that want to create higher level resilience/resilient libraries that interact with the MPI errors subsystem.
>
> Is what I have encountered a real issue or am I misunderstanding something? If it is a real issue, is it something we queue up for discussion at the upcoming F2F?
>
> Interested to hear your thoughts,
> ~Jim.
> _______________________________________________
> mpiwg-ft mailing list
> mpiwg-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-ft
More information about the mpiwg-ft
mailing list