[Mpi3-ft] Latest version of chapter
Darius Buntinas
buntinas at mcs.anl.gov
Fri Oct 21 15:44:18 CDT 2011
On Oct 21, 2011, at 3:19 PM, Supalov, Alexander wrote:
> Thanks. See below (prefix "AS>").
>
> -----Original Message-----
> From: mpi3-ft-bounces at lists.mpi-forum.org [mailto:mpi3-ft-bounces at lists.mpi-forum.org] On Behalf Of Darius Buntinas
> Sent: Friday, October 21, 2011 9:57 PM
> To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
> Subject: Re: [Mpi3-ft] Latest version of chapter
>
>
> Let's say you have commA used by library A and commB used by library B.
> Library A has registered the proc failure handler called handlerA on commA.
>
> Now, let's say a process that's in commA but not commB failed, and the thread is executing in library B and calls, e.g., MPI_Send(..., commB).
>
> The MPI implementation performs the MPI_Send operation normally, then calls handlerA(commA, MPI_ERR_PROC_FAIL_STOP), and returns from MPI_Send normally.
>
> While in handlerA, the subject communicator (commA) is passed as a parameter, so it won't be out of scope.
>
> Is it a problem that library A's handler is called from "within" library B?
>
> AS> Sure. This handler may have been written by someone else who does not know me or my B or anything else. I may not even want it to be called from within my library B for security reasons. What if it unwinds the stack, connects to A's HQ, and dumps my confidential memory all over there?
Yikes! Don't link with libraries you don't trust :-)
I don't know how to handle this case, but does the current standard prevent a library from snooping memory from other libraries? A library could set an attribute with a copy callback function on comm_world. That would be called from within another library's stack if that library tries to dup comm_world.
> Moreover, by the time it's called, both A and commA may be the thing of the times long gone together with the context in which handlerA was supposed to be executed. What will it try to handle then and under what assumptions? I don't know. You?
The handlers are freed when the comm/win/file they're attached to is freed, so you'll never get a handler called with a comm/win/file that's invalid.
-d
> -d
>
>
> On Oct 21, 2011, at 2:42 PM, Supalov, Alexander wrote:
>
>> Not really. How do you want the user make sense of that? E.g., I call A on commA, fail on commA asynchronously while calling a totally unrelated B on commB that has no failures in it, and am kicked out of B into someone else's error handler saying some "A" on "comma" failed? And what now? I may even have A and commA out of scope by then, possibly forever.
>>
>> -----Original Message-----
>> From: mpi3-ft-bounces at lists.mpi-forum.org [mailto:mpi3-ft-bounces at lists.mpi-forum.org] On Behalf Of Darius Buntinas
>> Sent: Friday, October 21, 2011 9:35 PM
>> To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
>> Subject: Re: [Mpi3-ft] Latest version of chapter
>>
>> With (regular) error handlers, they'll be called from within the function that raises the error. With failure notification, because they're being called as a result of an external event (process failure), you could be called from within any function, even one not related to the comm/file/win that you registered the process failure notification handler on.
>>
>> Does that make sense?
>>
>> -d
>>
>> On Oct 21, 2011, at 1:51 PM, Sur, Sayantan wrote:
>>
>>> 17.5.1:11-12 - "The error handler function will be called by the MPI implementation from within the context of some MPI function that was called by the user."
>>>
>>> Maybe we should that error handlers are called from MPI functions that are associated with that comm/file/win?
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: mpi3-ft-bounces at lists.mpi-forum.org [mailto:mpi3-ft-
>>>> bounces at lists.mpi-forum.org] On Behalf Of Josh Hursey
>>>> Sent: Friday, October 21, 2011 10:28 AM
>>>> To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
>>>> Subject: Re: [Mpi3-ft] Latest version of chapter
>>>>
>>>> I just wanted to note that we want to distribute a copy of this
>>>> chapter to the MPI Forum before the meeting. As such we are planning
>>>> on sending out a copy at COB today (so Friday ~5:00 pm EDT) so that
>>>> people have an opportunity to look at the document before the Monday
>>>> plenary. So please send any edits or comments before COB today, so we
>>>> can work them into the draft.
>>>>
>>>> We will post the draft to the ticket, so that people know where to
>>>> look for the current draft.
>>>>
>>>> Thanks,
>>>> Josh
>>>>
>>>>
>>>>
>>>> On Thu, Oct 20, 2011 at 7:06 PM, Darius Buntinas <buntinas at mcs.anl.gov>
>>>> wrote:
>>>>>
>>>>> Here's the latest version of the FT chapter is on the wiki (it's in a
>>>> new location on the main FT page under "ticket #276". Please have a
>>>> look and comment.
>>>>>
>>>>> Here's a direct link to the PDF:
>>>>> https://svn.mpi-forum.org/trac/mpi-forum-web/raw-
>>>> attachment/wiki/FaultToleranceWikiPage/ft.pdf
>>>>>
>>>>> Here's a summary of the changes Josh and I made:
>>>>>
>>>>> * Minor wording touchups
>>>>> * Added new semantic for MPI_ANY_SOURCE with the
>>>> MPI_ERR_ANY_SOURCE_DISABLED error code
>>>>> * Coverted wording for all comm, win, fh creation operations to not
>>>> require collectively active communicators (eliminate requirement for
>>>> synchronization)
>>>>> * Added missing reader_lock to ANY_SOURCE example
>>>>> * Added case for MPI_WIN_TEST
>>>>>
>>>>> and
>>>>>
>>>>> One-sided section
>>>>> clarified that window creation need not be blocking
>>>>> clarified that RMA ops might not complete correctly even if
>>>>> synchronization ops complete without error due to process
>>>>> failures
>>>>> Process failure notification
>>>>> Added section describing new functions to add callbacks to comms,
>>>>> wins and files that are called when proc failure is detected
>>>>> Other wordsmithing/cleanup changes
>>>>>
>>>>> -d
>>>>> _______________________________________________
>>>>> mpi3-ft mailing list
>>>>> mpi3-ft at lists.mpi-forum.org
>>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Joshua Hursey
>>>> Postdoctoral Research Associate
>>>> Oak Ridge National Laboratory
>>>> http://users.nccs.gov/~jjhursey
>>>>
>>>> _______________________________________________
>>>> mpi3-ft mailing list
>>>> mpi3-ft at lists.mpi-forum.org
>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>>
>>> _______________________________________________
>>> mpi3-ft mailing list
>>> mpi3-ft at lists.mpi-forum.org
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>
>>
>> _______________________________________________
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>> --------------------------------------------------------------------------------------
>> Intel GmbH
>> Dornacher Strasse 1
>> 85622 Feldkirchen/Muenchen, Deutschland
>> Sitz der Gesellschaft: Feldkirchen bei Muenchen
>> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
>> Registergericht: Muenchen HRB 47456
>> Ust.-IdNr./VAT Registration No.: DE129385895
>> Citibank Frankfurt a.M. (BLZ 502 109 00) 600119052
>>
>>
>> _______________________________________________
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>
>
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
> --------------------------------------------------------------------------------------
> Intel GmbH
> Dornacher Strasse 1
> 85622 Feldkirchen/Muenchen, Deutschland
> Sitz der Gesellschaft: Feldkirchen bei Muenchen
> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> Registergericht: Muenchen HRB 47456
> Ust.-IdNr./VAT Registration No.: DE129385895
> Citibank Frankfurt a.M. (BLZ 502 109 00) 600119052
>
>
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
More information about the mpiwg-ft
mailing list