[mpiwg-ft] [MPI Forum] #323: User-Level Failure Mitigation
Jim Dinan
james.dinan at gmail.com
Wed Feb 4 08:35:28 CST 2015
If flush is performed during an exclusive lock epoch, it may not need to
wait for remote completion. Keep in mind that flush talks about visibility
of updates, not the specific protocol that the implementation needs to
use. For example, MPICH implements RMA on top of active messages that
traverse ordered channels. In that scenario, flush during an exclusive
lock epoch where the lock has already been acquired can be a no-op. If we
required win_flush to always wait for remote completion (i.e. always
notify) in the FT proposal, we would be prescribing a particular protocol,
which we would like to avoid if possible.
On Tue, Feb 3, 2015 at 4:14 PM, Jeff Hammond <jeff.science at gmail.com> wrote:
> How is win_flush not synchronizing? It cause global visibility of updates.
> I don't see how a non-synchronizing implementation could exist.
>
> Jeff
>
> Sent from my iPhone
>
> > On Feb 3, 2015, at 12:50 PM, MPI Forum <mpi-forum at lists.mpi-forum.org>
> wrote:
> >
> > #323: User-Level Failure Mitigation
> > -------------------------------------+-----------------------------------
> > Reporter: bosilca | Owner: bosilca
> > Type: Enhancements to standard | Status: new
> > Priority: Scheduled | Milestone: Future
> > Version: MPI 4.0 | Resolution:
> > Keywords: FT | Implementation status: Completed
> > -------------------------------------+-----------------------------------
> >
> > Comment (by bouteill):
> >
> > Replying to [comment:32 jhammond]:
> >> This means that page 5 line 11 of the latest FT proposal must be amended
> > somehow, as it pertains to the use of the phrase "epoch closing" (which
> > should be "epoch-closing", no?), unless you deliberately mean to exclude
> > {{{MPI_WIN_FLUSH(_LOCAL)(_ALL)}}} and {{{MPI_WIN_SYNC}}} from the list of
> > functions that must raise a process failure exception. And if they are
> > excluded, then their relationship to FT is ambiguous, since they are
> > neither communication operations nor epoch-closing synchronization.
> >>
> >> I suppose that we should treat {{{MPI_WIN_FLUSH_LOCAL(_ALL)}}}
> > differently from {{{MPI_WIN_FLUSH(_ALL)}}}, since the former is a local
> > operation and the latter is a nonlocal one. Given that
> > {{{MPI_WIN_FLUSH(_ALL)}}} induce remote completion, they will detect
> > remote process failures and thus can be required to raise these without
> > introducing unreasonable overhead.
> >
> > Ok, thinking more about this I came to the conclusion that the current
> > text is correct: WIN_FLUSH is not local, but it is ordering more than
> > remote completion, so it may not always detect errors. If it does (when
> > the particular implementation does guarantee remote completion), it will
> > raise an exception (as is possible in any cases), when the implementation
> > is not synchronizing, it will not. Mandating the raising of the exception
> > may make it more expensive.
> >
> > We may want to add some rationale/advices to remember why we came to that
> > conclusion (if you agree with me here) ?
> >
> > --
> > Ticket URL: <
> https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/323#comment:39>
> > MPI Forum <https://svn.mpi-forum.org/>
> > MPI Forum
> _______________________________________________
> mpiwg-ft mailing list
> mpiwg-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-ft
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20150204/733e9d42/attachment-0001.html>
More information about the mpiwg-ft
mailing list