<div dir="ltr">Hi George,<div><br></div><div>The issue I wanted to ask about is that it is impossible to write an FT helper library that causes an error to be returned by an MPI operation.  For example, an error in MPI_Send cannot trigger a library-defined handler that does some handling and also returns an error code to the application (so the application can do additional handling).  Is this a potential problem from the perspective of FT, and if so should we look at this issue?  Anything we might do with error handlers seems like it would be a new proposal.</div><div><br></div><div>The way I described the issue was admittedly weird.  I think this discrepancy emerges clearly when you look at the way error handlers are defined.  The MPI_ERRORS_RETURN error handler is the only way to get anything but success back from an MPI call.  That is, the error handler cannot pass some error handling responsibility back to the application.  From that perspective, MPI_ERRORS_RETURN can be viewed as either a special case or inconsistency in the error handlers interface.  In order to write code that invokes the error handler, one has to write something like:</div><div><br></div><div><div>MPI_Errhandler err_hdl;</div><div>MPI_Comm_get_errhandler(comm, &err_hdl);</div><div><br></div><div>if (err_hdl == MPI_ERRORS_RETURN)</div><div>  return MY_ERRCODE;</div><div>else</div><div>  MPI_Comm_call_errhandler(comm, MY_ERRCODE);</div></div><div><br></div><div>Anyway, the first paragraph is the actual question.  The rest is a digression about difficulties in using the error handler interface that we could discuss in more depth if the response to my question is an affirmative.</div><div><br></div><div> ~Jim.</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Nov 23, 2014 at 6:12 PM, George Bosilca <span dir="ltr"><<a href="mailto:bosilca@icl.utk.edu" target="_blank">bosilca@icl.utk.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Jim,<br>

<br>

I am not sure on what you base you understanding regarding the different signature for MPI_ERROR_RETURNS. I can’t find anything suggesting this in the standard.<br>

<br>

My understanding is that MPI_COMM_CALL_ERRHANDLER is supposed to return MPI_SUCCESS if the error handler has been successfully called, as indicated in the following snippet from the standard.<br>

<br>

> This function invokes the error handler assigned to the communicator with the error code supplied. This function returns MPI_SUCCESS in C and the same value in IERROR if the error handler was successfully called (assuming the process is not aborted and the error handler returns).<br>

<br>

Additionally there is no reason for a user defined error handler to return an error code, it should only react to the error it is informed about.<br>

<br>

We discussed a similar issue in the WG a while back. The errhandler receives the error as a reference, fact that seems to indicate that the error handle might be allowed to change the error. Unfortunately, this is very unsettling as we can’t allow a user-defined error handler to replace a legit MPI error with MPI_SUCCESS, which basically returns MPI_SUCCESS to the upper layer when the MPI library might be in a unusable state.<br>

<br>

  George.<br>

<div><div class="h5"><br>

> On Nov 19, 2014, at 19:20 , Jim Dinan <<a href="mailto:james.dinan@gmail.com">james.dinan@gmail.com</a>> wrote:<br>

><br>

> Hi FT Folks,<br>

><br>

> I encountered this oddity while doing some hacking today.  Apparently, MPI_ERRORS_RETURN has a different signature from all other error handlers; it returns a value and all other error handlers (e.g. MPI_ERRORS_ARE_FATAL and user-defined error handlers) have void return:<br>

><br>

> typedef void MPI_Comm_errhandler_function(MPI_Comm *, int *, ...);<br>

><br>

> Because of this, MPI_Comm_call_errhandler effectively does not do anything when the error handler is set to MPI_ERRORS_RETURN.  More specifically, the following code,<br>

><br>

> return MPI_Comm_call_errhandler(my_comm, my_err_code);<br>

><br>

> always returns success.<br>

><br>

> To summarize the issues: (1) It is difficult or impossible to use the MPI error subsystem when the error handler is set to MPI_ERRORS_RETURN and (2) it is impossible for a user defined error handler to return an error code.  These difficulties seem like they could be troublesome to users of FT that want to create higher level resilience/resilient libraries that interact with the MPI errors subsystem.<br>

><br>

> Is what I have encountered a real issue or am I misunderstanding something?  If it is a real issue, is it something we queue up for discussion at the upcoming F2F?<br>

><br>

> Interested to hear your thoughts,<br>

>  ~Jim.<br>

</div></div>> _______________________________________________<br>

> mpiwg-ft mailing list<br>

> <a href="mailto:mpiwg-ft@lists.mpi-forum.org">mpiwg-ft@lists.mpi-forum.org</a><br>

> <a href="http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-ft" target="_blank">http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-ft</a><br>

<br>

_______________________________________________<br>

mpiwg-ft mailing list<br>

<a href="mailto:mpiwg-ft@lists.mpi-forum.org">mpiwg-ft@lists.mpi-forum.org</a><br>

<a href="http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-ft" target="_blank">http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-ft</a></blockquote></div><br></div>