[Mpi3-ft] Fault Tolerance & RMA Discussion

Josh Hursey jjhursey at open-mpi.org
Thu Feb 2 09:01:12 CST 2012


Just a reminder that we are meeting today at Noon Eastern to discuss RMA in
the context of the fault tolerance proposal.

The Run-Through Stabilization proposal can be found attached to the ticket:
  https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/276

https://svn.mpi-forum.org/trac/mpi-forum-web/attachment/ticket/276/FTWG-Process-FT-Draft-2011-12-20.pdf

We will be focusing on section 17.11 of that document. Note that this
section does not currently explicitly account for the new RMA proposal, but
we would like to remedy that for the next reading.

Thanks,
Josh

On Wed, Jan 25, 2012 at 3:15 PM, Josh Hursey <jjhursey at open-mpi.org> wrote:

> There was no one date/time that worked for everyone, but I chose a time
> that worked for most of the respondents. We will meet Thursday, Feb. 2 from
> 12-1 pm EST/New York to discuss this topic.
>
> We can use the following teleconf information:
>   US Toll Free number: 877-801-8130
>   Toll number: 1-203-692-8690
>   Access Code: 1044056
>
> Thanks,
> Josh
>
>
> On Mon, Jan 23, 2012 at 4:33 PM, Josh Hursey <jjhursey at open-mpi.org>wrote:
>
>> (Cross posted to both the RMA and FT MPI-3 listservs)
>>
>> During the FT plenary session at the Jan. MPI Forum meeting it was
>> recommended that some of the members of the FT group and the RMA group have
>> a meeting to hash out the precise details of the FT semantics for the RMA
>> chapter. So I would like to facilitate such a discussion, preferability in
>> the next week (so we have time to fine tune things before the next forum
>> meeting).
>>
>> In general, we are trying to answer the question "How should RMA
>> operations behave when a process failure occurs?" The feeling seemed to be
>> that the current approach is ok (invalidating the window, forcing
>> recreation/validation), but the statement that the memory exposed in the
>> window is 'undefined' seemed excessive. The suggestion was to change the
>> wording to something like "Only the memory associated with a window that
>> was targeted by an operation that modified it is undefined after process
>> failure in the group associated with the window." This lead to a
>> considerable amount of debate in the meeting, so it was suggested that we
>> take the discussion offline.
>>
>> Below is a link to a doodle poll to find a good time for a teleconf. If
>> you are interested in participating in this discussion, please fill this
>> poll out by 2 PM Eastern on Wed. Jan 25 so we can set the date/time.
>>    http://www.doodle.com/vd33va5h8iankega
>>
>> Thanks,
>> Josh
>>
>> --
>> Joshua Hursey
>> Postdoctoral Research Associate
>> Oak Ridge National Laboratory
>> http://users.nccs.gov/~jjhursey
>>
>
>
>
> --
> Joshua Hursey
> Postdoctoral Research Associate
> Oak Ridge National Laboratory
> http://users.nccs.gov/~jjhursey
>



-- 
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20120202/d1e99aa4/attachment.html>


More information about the mpiwg-ft mailing list