[Mpi3-ft] Ticket #324: Clarify MPI_ERRORS_ARE_FATAL scope of abort

George Bosilca bosilca at icl.utk.edu
Mon May 13 22:01:08 CDT 2013


Dave points are entirely valid and represent a subtle [corner-]case in the standard. Orphaned non-completed requests, in the sense that the request was freed (MPI_Request_free) and the associated communicator was freed as well (MPI_Comm_free) are defined as raising errors on MPI_COMM_WORLD. Thus, the scope of the request become global, and a fault on such particular requests will bring down the entire MPI_COMM_WORLD (which is against the original scope of the ticket, at least the first part).

  George.


On May 13, 2013, at 12:05 , Wesley Bland <wbland at mcs.anl.gov> wrote:

> After looking at this ticket some more, Aurelien and I were confused about the objections to the ticket from the forum at large. It appeared that some of the objections reported by Dave on the ticket might have come from a misunderstanding in the forum of what the ticket meant. The proposed plan at this point is to discuss the ticket during our plenary in San Jose to try to discern the objection so we can bring a new version of this ticket if necessary or start the process again if the text is good.
> 
> On May 7, 2013, at 4:53 PM, Wesley Bland <wbland at mcs.anl.gov> wrote:
> 
>> author: jjhursey
>> 
>> https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/324
>> 
>> This ticket essentially links MPI_ERRORS_ARE_FATAL on a communicator to calling MPI_ABORT on the communicator, i.e. only the processes in that communicator are aborted, while other communicators could potentially remain functional.
>> 
>> There was much discussion on the ticket about the scope of this change, and in the end the ticket has remained stagnant for about a year because of it, however I don't think that the changes here should be too controversial. According to the ticket, the main argument against it at the Japan meeting was that for some types of functions, there is not a request which can be used to provide error checking and therefore when an error occurs, the entire application would be forced to fall back to MPI_ERRORS_ARE_FATAL despite setting another error handler, therefore making FT difficult. Some alternate text was provided on the ticket.
> 
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20130513/dc275ff8/attachment-0001.html>


More information about the mpiwg-ft mailing list