[Mpi3-ft] Latest version of chapter

Darius Buntinas buntinas at mcs.anl.gov
Fri Oct 21 15:49:59 CDT 2011


On Oct 21, 2011, at 3:37 PM, Sur, Sayantan wrote:

> Hi Darius,
> 
>> -----Original Message-----
>> From: mpi3-ft-bounces at lists.mpi-forum.org [mailto:mpi3-ft-
>> bounces at lists.mpi-forum.org] On Behalf Of Darius Buntinas
>> Sent: Friday, October 21, 2011 12:35 PM
>> To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
>> Subject: Re: [Mpi3-ft] Latest version of chapter
>> 
>> With (regular) error handlers, they'll be called from within the
>> function that raises the error.  With failure notification, because
>> they're being called as a result of an external event (process
>> failure), you could be called from within any function, even one not
>> related to the comm/file/win that you registered the process failure
>> notification handler on.
>> 
>> Does that make sense?
>> 
> 
> OK. I thought that this would violate the spirit (if not the letter) of the rationale in 17.5 just before this.
> 
> "These semantics allow a process to continue running without being interrupted by the failure of processes with which they may never or rarely communicate."

The process will only set the handler if it wants to be notified of failures.  If it doesn't set the handler, then it won't be interrupted (of course the MPI library might be interrupted and do some processing for every failure notification, but that's transparent to the user).

> Also, aren't normal error handlers only called from within related calls? If FT handlers can be called from anywhere, then it (at least) merits a clear description in that it deviates from normal error handlers (in addition to Alexander's concerns).

Sure, we'll probably also need to specify what kind of operations are allowed to be performed from inside the handler (e.g., can you call MPI_Send?  can you free the comm/win/file in question?)

-d

> Thanks.
> 
>> -d
>> 
>> On Oct 21, 2011, at 1:51 PM, Sur, Sayantan wrote:
>> 
>>> 17.5.1:11-12 - "The error handler function will be called by the MPI
>> implementation from within the context of some MPI function that was
>> called by the user."
>>> 
>>> Maybe we should that error handlers are called from MPI functions
>> that are associated with that comm/file/win?
>>> 
>>> 
>>> 
>>>> -----Original Message-----
>>>> From: mpi3-ft-bounces at lists.mpi-forum.org [mailto:mpi3-ft-
>>>> bounces at lists.mpi-forum.org] On Behalf Of Josh Hursey
>>>> Sent: Friday, October 21, 2011 10:28 AM
>>>> To: MPI 3.0 Fault Tolerance and Dynamic Process Control working
>> Group
>>>> Subject: Re: [Mpi3-ft] Latest version of chapter
>>>> 
>>>> I just wanted to note that we want to distribute a copy of this
>>>> chapter to the MPI Forum before the meeting. As such we are planning
>>>> on sending out a copy at COB today (so Friday ~5:00 pm EDT) so that
>>>> people have an opportunity to look at the document before the Monday
>>>> plenary. So please send any edits or comments before COB today, so
>> we
>>>> can work them into the draft.
>>>> 
>>>> We will post the draft to the ticket, so that people know where to
>>>> look for the current draft.
>>>> 
>>>> Thanks,
>>>> Josh
>>>> 
>>>> 
>>>> 
>>>> On Thu, Oct 20, 2011 at 7:06 PM, Darius Buntinas
>> <buntinas at mcs.anl.gov>
>>>> wrote:
>>>>> 
>>>>> Here's the latest version of the FT chapter is on the wiki (it's in
>> a
>>>> new location on the main FT page under "ticket #276".  Please have a
>>>> look and comment.
>>>>> 
>>>>> Here's a direct link to the PDF:
>>>>>   https://svn.mpi-forum.org/trac/mpi-forum-web/raw-
>>>> attachment/wiki/FaultToleranceWikiPage/ft.pdf
>>>>> 
>>>>> Here's a summary of the changes Josh and I made:
>>>>> 
>>>>> * Minor wording touchups
>>>>> * Added new semantic for MPI_ANY_SOURCE with the
>>>> MPI_ERR_ANY_SOURCE_DISABLED error code
>>>>> * Coverted wording for all comm, win, fh creation operations to not
>>>> require collectively active communicators (eliminate requirement for
>>>> synchronization)
>>>>> * Added missing reader_lock to ANY_SOURCE example
>>>>> * Added case for MPI_WIN_TEST
>>>>> 
>>>>> and
>>>>> 
>>>>> One-sided section
>>>>>   clarified that window creation need not be blocking
>>>>>   clarified that RMA ops might not complete correctly even if
>>>>>     synchronization ops complete without error due to process
>>>>>     failures
>>>>> Process failure notification
>>>>>   Added section describing new functions to add callbacks to
>> comms,
>>>>>     wins and files that are called when proc failure is detected
>>>>> Other wordsmithing/cleanup changes
>>>>> 
>>>>> -d
>>>>> _______________________________________________
>>>>> mpi3-ft mailing list
>>>>> mpi3-ft at lists.mpi-forum.org
>>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Joshua Hursey
>>>> Postdoctoral Research Associate
>>>> Oak Ridge National Laboratory
>>>> http://users.nccs.gov/~jjhursey
>>>> 
>>>> _______________________________________________
>>>> mpi3-ft mailing list
>>>> mpi3-ft at lists.mpi-forum.org
>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>> 
>>> _______________________________________________
>>> mpi3-ft mailing list
>>> mpi3-ft at lists.mpi-forum.org
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>> 
>> 
>> _______________________________________________
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
> 
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft





More information about the mpiwg-ft mailing list