[Mpi3-ft] Latest version of chapter
Darius Buntinas
buntinas at mcs.anl.gov
Fri Oct 21 14:57:17 CDT 2011
Let's say you have commA used by library A and commB used by library B.
Library A has registered the proc failure handler called handlerA on commA.
Now, let's say a process that's in commA but not commB failed, and the thread is executing in library B and calls, e.g., MPI_Send(..., commB).
The MPI implementation performs the MPI_Send operation normally, then calls handlerA(commA, MPI_ERR_PROC_FAIL_STOP), and returns from MPI_Send normally.
While in handlerA, the subject communicator (commA) is passed as a parameter, so it won't be out of scope.
Is it a problem that library A's handler is called from "within" library B?
-d
On Oct 21, 2011, at 2:42 PM, Supalov, Alexander wrote:
> Not really. How do you want the user make sense of that? E.g., I call A on commA, fail on commA asynchronously while calling a totally unrelated B on commB that has no failures in it, and am kicked out of B into someone else's error handler saying some "A" on "comma" failed? And what now? I may even have A and commA out of scope by then, possibly forever.
>
> -----Original Message-----
> From: mpi3-ft-bounces at lists.mpi-forum.org [mailto:mpi3-ft-bounces at lists.mpi-forum.org] On Behalf Of Darius Buntinas
> Sent: Friday, October 21, 2011 9:35 PM
> To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
> Subject: Re: [Mpi3-ft] Latest version of chapter
>
> With (regular) error handlers, they'll be called from within the function that raises the error. With failure notification, because they're being called as a result of an external event (process failure), you could be called from within any function, even one not related to the comm/file/win that you registered the process failure notification handler on.
>
> Does that make sense?
>
> -d
>
> On Oct 21, 2011, at 1:51 PM, Sur, Sayantan wrote:
>
>> 17.5.1:11-12 - "The error handler function will be called by the MPI implementation from within the context of some MPI function that was called by the user."
>>
>> Maybe we should that error handlers are called from MPI functions that are associated with that comm/file/win?
>>
>>
>>
>>> -----Original Message-----
>>> From: mpi3-ft-bounces at lists.mpi-forum.org [mailto:mpi3-ft-
>>> bounces at lists.mpi-forum.org] On Behalf Of Josh Hursey
>>> Sent: Friday, October 21, 2011 10:28 AM
>>> To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
>>> Subject: Re: [Mpi3-ft] Latest version of chapter
>>>
>>> I just wanted to note that we want to distribute a copy of this
>>> chapter to the MPI Forum before the meeting. As such we are planning
>>> on sending out a copy at COB today (so Friday ~5:00 pm EDT) so that
>>> people have an opportunity to look at the document before the Monday
>>> plenary. So please send any edits or comments before COB today, so we
>>> can work them into the draft.
>>>
>>> We will post the draft to the ticket, so that people know where to
>>> look for the current draft.
>>>
>>> Thanks,
>>> Josh
>>>
>>>
>>>
>>> On Thu, Oct 20, 2011 at 7:06 PM, Darius Buntinas <buntinas at mcs.anl.gov>
>>> wrote:
>>>>
>>>> Here's the latest version of the FT chapter is on the wiki (it's in a
>>> new location on the main FT page under "ticket #276". Please have a
>>> look and comment.
>>>>
>>>> Here's a direct link to the PDF:
>>>> https://svn.mpi-forum.org/trac/mpi-forum-web/raw-
>>> attachment/wiki/FaultToleranceWikiPage/ft.pdf
>>>>
>>>> Here's a summary of the changes Josh and I made:
>>>>
>>>> * Minor wording touchups
>>>> * Added new semantic for MPI_ANY_SOURCE with the
>>> MPI_ERR_ANY_SOURCE_DISABLED error code
>>>> * Coverted wording for all comm, win, fh creation operations to not
>>> require collectively active communicators (eliminate requirement for
>>> synchronization)
>>>> * Added missing reader_lock to ANY_SOURCE example
>>>> * Added case for MPI_WIN_TEST
>>>>
>>>> and
>>>>
>>>> One-sided section
>>>> clarified that window creation need not be blocking
>>>> clarified that RMA ops might not complete correctly even if
>>>> synchronization ops complete without error due to process
>>>> failures
>>>> Process failure notification
>>>> Added section describing new functions to add callbacks to comms,
>>>> wins and files that are called when proc failure is detected
>>>> Other wordsmithing/cleanup changes
>>>>
>>>> -d
>>>> _______________________________________________
>>>> mpi3-ft mailing list
>>>> mpi3-ft at lists.mpi-forum.org
>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Joshua Hursey
>>> Postdoctoral Research Associate
>>> Oak Ridge National Laboratory
>>> http://users.nccs.gov/~jjhursey
>>>
>>> _______________________________________________
>>> mpi3-ft mailing list
>>> mpi3-ft at lists.mpi-forum.org
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>
>> _______________________________________________
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>
>
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
> --------------------------------------------------------------------------------------
> Intel GmbH
> Dornacher Strasse 1
> 85622 Feldkirchen/Muenchen, Deutschland
> Sitz der Gesellschaft: Feldkirchen bei Muenchen
> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> Registergericht: Muenchen HRB 47456
> Ust.-IdNr./VAT Registration No.: DE129385895
> Citibank Frankfurt a.M. (BLZ 502 109 00) 600119052
>
>
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
More information about the mpiwg-ft
mailing list