[Mpi3-ft] Fault Tolerance & RMA Discussion

Josh Hursey jjhursey at open-mpi.org
Mon Jan 23 15:33:25 CST 2012


(Cross posted to both the RMA and FT MPI-3 listservs)

During the FT plenary session at the Jan. MPI Forum meeting it was
recommended that some of the members of the FT group and the RMA group have
a meeting to hash out the precise details of the FT semantics for the RMA
chapter. So I would like to facilitate such a discussion, preferability in
the next week (so we have time to fine tune things before the next forum
meeting).

In general, we are trying to answer the question "How should RMA operations
behave when a process failure occurs?" The feeling seemed to be that the
current approach is ok (invalidating the window, forcing
recreation/validation), but the statement that the memory exposed in the
window is 'undefined' seemed excessive. The suggestion was to change the
wording to something like "Only the memory associated with a window that
was targeted by an operation that modified it is undefined after process
failure in the group associated with the window." This lead to a
considerable amount of debate in the meeting, so it was suggested that we
take the discussion offline.

Below is a link to a doodle poll to find a good time for a teleconf. If you
are interested in participating in this discussion, please fill this poll
out by 2 PM Eastern on Wed. Jan 25 so we can set the date/time.
   http://www.doodle.com/vd33va5h8iankega

Thanks,
Josh

-- 
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20120123/b1f2c75e/attachment.html>


More information about the mpiwg-ft mailing list