[mpiwg-ft] [EXTERNAL] Re: FTWG Call Today
Teranishi, Keita
knteran at sandia.gov
Wed Mar 15 13:22:23 CDT 2017
Ignacio,
Does your technique creates replacement of main() (say main_reinit()) that makes a setjump() call inside? It’s interesting. Many scientific libraries make MPI_Init() call inside their initialization functions (such as PETSc_initialize() and BLACS_Init() ). I am not 100% sure how PETSC_Initialize() can return to the replacement of main(). Could you clarify the behavior of these functions maiking MPI_Init() call.
BTW (including SC14 version), Fenix_init() is a macro that is expanded to three function calls. So the user cannot call outside main() ☹.
Fenix_preinit();
Setjump();
Fenix_postinit();
For this reason, when using PETSc with Fenix, I have to expose fenix_init() to main(). I cannot put inside petsc_initialize(). After all, I ended up wroting petsc_reintialize() to modify the contents created by petsc_initialize(). If your approach works, I can put Fenix_init() and petsc_reinitalize_fenix() inside petsc_initialize(), making the code much cleaner.
Main()
{
petsc_initialize(); <= this is calling MPI_Init();
Fenix_init();
petsc_reinitialize_fenix();
:
:
:
}
Thanks,
Keita
On 3/15/17, 10:46 AM, "mpiwg-ft-bounces at lists.mpi-forum.org on behalf of Ignacio Laguna" <mpiwg-ft-bounces at lists.mpi-forum.org on behalf of lagunaperalt1 at llnl.gov> wrote:
Hey Aurelien,
Thanks! I understand the concern.
For gloabal-restart models like Reinit (and I believe that for the SC14
version of Fenix) this problem is solved by passing a reinit function
pointer to MPI, which it then calls after initialization (this function
is a replacement of main, and has the code that main originally
contained). Since this reinit function is kept in the stack (it never
returns), we can always long jump there.
I think the main problem is that we cannot long jump from a signal
handler, or more specifically it is undefined according to the C
language. We would need to find another mechanism for long jumping after
a signal handler is called as a result of a failure notification.
Ignacio
On 3/15/17 8:41 AM, Aurelien Bouteiller wrote:
>
> Hey Ignacio,
>
> Murali wanted to touch with you on that exact issue. The bottom line is
> that a setjump must be in the same stack frame as the long jump, which
> means that you can jump only to a function in which you are nested in.
> In many cases that means you can’t “hide” set jumps points in the
> library, as they have to be called in the application function context
> (so that they remain in your frame).
>
> Best,
> Aurelien
>
>> On Mar 14, 2017, at 18:15, Ignacio Laguna <lagunaperalt1 at llnl.gov
>> <mailto:lagunaperalt1 at llnl.gov>> wrote:
>>
>> Thanks for sharing the minutes.
>>
>> In the "scoped reinit-like approaches", there is the point of "still
>> subject to the longjmp complication". Can folks comment on what is the
>> issue with respect to setjump/longjump in global-restart approaches,
>> such as Reinit and/or Fenix?
>>
>> Thanks!
>>
>> Ignacio
>>
>>
>> On 3/14/17 1:49 PM, Aurelien Bouteiller wrote:
>>> Minutes for the call have been posted here:
>>> https://github.com/mpiwg-ft/ft-issues/wiki/2017-03-14
>>>
>>>> On Mar 14, 2017, at 15:00, Aurelien Bouteiller <bouteill at icl.utk.edu
>>>> <mailto:bouteill at icl.utk.edu>
>>>> <mailto:bouteill at icl.utk.edu>> wrote:
>>>>
>>>> Hi there,
>>>>
>>>> Aurelien Bouteiller is inviting you to a scheduled Zoom meeting.
>>>>
>>>> Topic: MPI FT WG
>>>> Time: Mar 14, 2017 3:00 PM Eastern Time (US and Canada)
>>>>
>>>> Join from PC, Mac, Linux, iOS or
>>>> Android: https://tennessee.zoom.us/j/607816420?pwd=MuG6Nboy9%2Fo%3D
>>>> Password: beef
>>>>
>>>> Or iPhone one-tap (US Toll): +14086380968,607816420# or
>>>> +16465588656,607816420#
>>>>
>>>> Or Telephone:
>>>> Dial: +1 408 638 0968 (US Toll) or +1 646 558 8656 (US Toll)
>>>> Meeting ID: 607 816 420
>>>> International numbers
>>>> available: https://tennessee.zoom.us/zoomconference?m=fUOjmMyJwtMIeEsk8yo8CgLo3JR6yrTM
>>>>
>>>> Or an H.323/SIP room system:
>>>> H.323: 162.255.37.11 (US West) or 162.255.36.11 (US East)
>>>> Meeting ID: 607 816 420
>>>> Password: 463530
>>>>
>>>> SIP: 607816420 at zoomcrc.com
>>>> <mailto:607816420 at zoomcrc.com> <mailto:607816420 at zoomcrc.com>
>>>> Password: 463530
>>>>
>>>>
>>>>
>>>>> On Mar 14, 2017, at 10:54, Aurelien Bouteiller
>>>>> <bouteill at icl.utk.edu <mailto:bouteill at icl.utk.edu>
>>>>> <mailto:bouteill at icl.utk.edu>> wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> We have the FTWG call scheduled for today. I’d like to debrief the
>>>>> latest MPI forum activities, and continue the discussion on
>>>>> converging localized and globalized recovery.
>>>>>
>>>>> I attach here the slide I used during the WG time.
>>>>> <20170228-mpiforum-errwg.pptx>
>>>>>
>>>>> We may also want to decide the time for our future meeting based on
>>>>> the doodle poll initiated by Wesley a while back.
>>>>> http://doodle.com/poll/s5uvmpux4nc6ki4y#table
>>>>>
>>>>> ===
>>>>> Looking back at the notes from our last call in December, I believe
>>>>> the TODO items are for Aurelien, Ignacio, and myself to flesh out the
>>>>> three FT recovery proposals and then see how they would interact with
>>>>> each other.
>>>>>
>>>>> * I believe Aurelien had some ideas about how to overcome some of the
>>>>> problems raised at the last meeting. Aurelien, if you could put
>>>>> together a slide or two that we could use for the discussion, that
>>>>> would probably be helpful.
>>>>> * I'm not sure of the status of Ignacio putting together some slides
>>>>> for the reinit proposal. If I remember the meeting long ago in San
>>>>> Jose, we just looked at a header. It might be nice to have something
>>>>> a little more high level to point to.
>>>>> * I still need to make the slides for the auto recovery strategy that
>>>>> Martin proposed.
>>>>>
>>>>> Once that's done, we can see where these things interact and how
>>>>> difficult it would be to support them together.
>>>>>
>>>>> Thoughts?
>>>>> Wesley
>>>>> _______________________________________________
>>>>> mpiwg-ft mailing list
>>>>> mpiwg-ft at lists.mpi-forum.org
>>>>> <mailto:mpiwg-ft at lists.mpi-forum.org> <mailto:mpiwg-ft at lists.mpi-forum.org>
>>>>> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-ft
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> mpiwg-ft mailing list
>>> mpiwg-ft at lists.mpi-forum.org <mailto:mpiwg-ft at lists.mpi-forum.org>
>>> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-ft
>>>
>> _______________________________________________
>> mpiwg-ft mailing list
>> mpiwg-ft at lists.mpi-forum.org <mailto:mpiwg-ft at lists.mpi-forum.org>
>> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-ft
>
_______________________________________________
mpiwg-ft mailing list
mpiwg-ft at lists.mpi-forum.org
https://lists.mpi-forum.org/mailman/listinfo/mpiwg-ft
More information about the mpiwg-ft
mailing list