[mpiwg-ft] MPI_Comm_call_errhandler

Jim Dinan james.dinan at gmail.com
Tue Nov 25 09:43:37 CST 2014


Hi George,

The issue I wanted to ask about is that it is impossible to write an FT
helper library that causes an error to be returned by an MPI operation.
For example, an error in MPI_Send cannot trigger a library-defined handler
that does some handling and also returns an error code to the application
(so the application can do additional handling).  Is this a potential
problem from the perspective of FT, and if so should we look at this
issue?  Anything we might do with error handlers seems like it would be a
new proposal.

The way I described the issue was admittedly weird.  I think this
discrepancy emerges clearly when you look at the way error handlers are
defined.  The MPI_ERRORS_RETURN error handler is the only way to get
anything but success back from an MPI call.  That is, the error handler
cannot pass some error handling responsibility back to the application.
>From that perspective, MPI_ERRORS_RETURN can be viewed as either a special
case or inconsistency in the error handlers interface.  In order to write
code that invokes the error handler, one has to write something like:

MPI_Errhandler err_hdl;
MPI_Comm_get_errhandler(comm, &err_hdl);

if (err_hdl == MPI_ERRORS_RETURN)
  return MY_ERRCODE;
else
  MPI_Comm_call_errhandler(comm, MY_ERRCODE);

Anyway, the first paragraph is the actual question.  The rest is a
digression about difficulties in using the error handler interface that we
could discuss in more depth if the response to my question is an
affirmative.

 ~Jim.


On Sun, Nov 23, 2014 at 6:12 PM, George Bosilca <bosilca at icl.utk.edu> wrote:

> Jim,
>
> I am not sure on what you base you understanding regarding the different
> signature for MPI_ERROR_RETURNS. I can’t find anything suggesting this in
> the standard.
>
> My understanding is that MPI_COMM_CALL_ERRHANDLER is supposed to return
> MPI_SUCCESS if the error handler has been successfully called, as indicated
> in the following snippet from the standard.
>
> > This function invokes the error handler assigned to the communicator
> with the error code supplied. This function returns MPI_SUCCESS in C and
> the same value in IERROR if the error handler was successfully called
> (assuming the process is not aborted and the error handler returns).
>
> Additionally there is no reason for a user defined error handler to return
> an error code, it should only react to the error it is informed about.
>
> We discussed a similar issue in the WG a while back. The errhandler
> receives the error as a reference, fact that seems to indicate that the
> error handle might be allowed to change the error. Unfortunately, this is
> very unsettling as we can’t allow a user-defined error handler to replace a
> legit MPI error with MPI_SUCCESS, which basically returns MPI_SUCCESS to
> the upper layer when the MPI library might be in a unusable state.
>
>   George.
>
> > On Nov 19, 2014, at 19:20 , Jim Dinan <james.dinan at gmail.com> wrote:
> >
> > Hi FT Folks,
> >
> > I encountered this oddity while doing some hacking today.  Apparently,
> MPI_ERRORS_RETURN has a different signature from all other error handlers;
> it returns a value and all other error handlers (e.g. MPI_ERRORS_ARE_FATAL
> and user-defined error handlers) have void return:
> >
> > typedef void MPI_Comm_errhandler_function(MPI_Comm *, int *, ...);
> >
> > Because of this, MPI_Comm_call_errhandler effectively does not do
> anything when the error handler is set to MPI_ERRORS_RETURN.  More
> specifically, the following code,
> >
> > return MPI_Comm_call_errhandler(my_comm, my_err_code);
> >
> > always returns success.
> >
> > To summarize the issues: (1) It is difficult or impossible to use the
> MPI error subsystem when the error handler is set to MPI_ERRORS_RETURN and
> (2) it is impossible for a user defined error handler to return an error
> code.  These difficulties seem like they could be troublesome to users of
> FT that want to create higher level resilience/resilient libraries that
> interact with the MPI errors subsystem.
> >
> > Is what I have encountered a real issue or am I misunderstanding
> something?  If it is a real issue, is it something we queue up for
> discussion at the upcoming F2F?
> >
> > Interested to hear your thoughts,
> >  ~Jim.
> > _______________________________________________
> > mpiwg-ft mailing list
> > mpiwg-ft at lists.mpi-forum.org
> > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-ft
>
> _______________________________________________
> mpiwg-ft mailing list
> mpiwg-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-ft
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20141125/ffcf938f/attachment-0001.html>


More information about the mpiwg-ft mailing list