[mpiwg-ft] [MPI Forum] #323: User-Level Failure Mitigation
Jeff Hammond
jeff.science at gmail.com
Tue Feb 3 15:14:11 CST 2015
How is win_flush not synchronizing? It cause global visibility of updates. I don't see how a non-synchronizing implementation could exist.
Jeff
Sent from my iPhone
> On Feb 3, 2015, at 12:50 PM, MPI Forum <mpi-forum at lists.mpi-forum.org> wrote:
>
> #323: User-Level Failure Mitigation
> -------------------------------------+-----------------------------------
> Reporter: bosilca | Owner: bosilca
> Type: Enhancements to standard | Status: new
> Priority: Scheduled | Milestone: Future
> Version: MPI 4.0 | Resolution:
> Keywords: FT | Implementation status: Completed
> -------------------------------------+-----------------------------------
>
> Comment (by bouteill):
>
> Replying to [comment:32 jhammond]:
>> This means that page 5 line 11 of the latest FT proposal must be amended
> somehow, as it pertains to the use of the phrase "epoch closing" (which
> should be "epoch-closing", no?), unless you deliberately mean to exclude
> {{{MPI_WIN_FLUSH(_LOCAL)(_ALL)}}} and {{{MPI_WIN_SYNC}}} from the list of
> functions that must raise a process failure exception. And if they are
> excluded, then their relationship to FT is ambiguous, since they are
> neither communication operations nor epoch-closing synchronization.
>>
>> I suppose that we should treat {{{MPI_WIN_FLUSH_LOCAL(_ALL)}}}
> differently from {{{MPI_WIN_FLUSH(_ALL)}}}, since the former is a local
> operation and the latter is a nonlocal one. Given that
> {{{MPI_WIN_FLUSH(_ALL)}}} induce remote completion, they will detect
> remote process failures and thus can be required to raise these without
> introducing unreasonable overhead.
>
> Ok, thinking more about this I came to the conclusion that the current
> text is correct: WIN_FLUSH is not local, but it is ordering more than
> remote completion, so it may not always detect errors. If it does (when
> the particular implementation does guarantee remote completion), it will
> raise an exception (as is possible in any cases), when the implementation
> is not synchronizing, it will not. Mandating the raising of the exception
> may make it more expensive.
>
> We may want to add some rationale/advices to remember why we came to that
> conclusion (if you agree with me here) ?
>
> --
> Ticket URL: <https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/323#comment:39>
> MPI Forum <https://svn.mpi-forum.org/>
> MPI Forum
More information about the mpiwg-ft
mailing list