[Mpi3-ft] Fault Tolerance & RMA Discussion
jjhursey at open-mpi.org
Mon Feb 6 13:45:05 CST 2012
We are going to meet from 10-11 am (Eastern) on Feb. 7 to continue our
conversation. We will use the same call-in information as before.
On Thu, Feb 2, 2012 at 3:00 PM, Josh Hursey <jjhursey at open-mpi.org> wrote:
> We made some really good progress on today's call. Attached are some notes
> that I took from the call.
> At the end of the call there were a couple of items that we wanted to get
> a finer understanding of. As a result we are going to try to setup another
> Below is a doodle poll to pick a date/time:
> If you are interested in attending this teleconf, please fill out the poll
> by 2 pm Eastern on Monday, Feb. 6.
> On Thu, Feb 2, 2012 at 10:01 AM, Josh Hursey <jjhursey at open-mpi.org>wrote:
>> Just a reminder that we are meeting today at Noon Eastern to discuss RMA
>> in the context of the fault tolerance proposal.
>> The Run-Through Stabilization proposal can be found attached to the
>> We will be focusing on section 17.11 of that document. Note that this
>> section does not currently explicitly account for the new RMA proposal, but
>> we would like to remedy that for the next reading.
>> On Wed, Jan 25, 2012 at 3:15 PM, Josh Hursey <jjhursey at open-mpi.org>wrote:
>>> There was no one date/time that worked for everyone, but I chose a time
>>> that worked for most of the respondents. We will meet Thursday, Feb. 2 from
>>> 12-1 pm EST/New York to discuss this topic.
>>> We can use the following teleconf information:
>>> US Toll Free number: 877-801-8130
>>> Toll number: 1-203-692-8690
>>> Access Code: 1044056
>>> On Mon, Jan 23, 2012 at 4:33 PM, Josh Hursey <jjhursey at open-mpi.org>wrote:
>>>> (Cross posted to both the RMA and FT MPI-3 listservs)
>>>> During the FT plenary session at the Jan. MPI Forum meeting it was
>>>> recommended that some of the members of the FT group and the RMA group have
>>>> a meeting to hash out the precise details of the FT semantics for the RMA
>>>> chapter. So I would like to facilitate such a discussion, preferability in
>>>> the next week (so we have time to fine tune things before the next forum
>>>> In general, we are trying to answer the question "How should RMA
>>>> operations behave when a process failure occurs?" The feeling seemed to be
>>>> that the current approach is ok (invalidating the window, forcing
>>>> recreation/validation), but the statement that the memory exposed in the
>>>> window is 'undefined' seemed excessive. The suggestion was to change the
>>>> wording to something like "Only the memory associated with a window that
>>>> was targeted by an operation that modified it is undefined after process
>>>> failure in the group associated with the window." This lead to a
>>>> considerable amount of debate in the meeting, so it was suggested that we
>>>> take the discussion offline.
>>>> Below is a link to a doodle poll to find a good time for a teleconf. If
>>>> you are interested in participating in this discussion, please fill this
>>>> poll out by 2 PM Eastern on Wed. Jan 25 so we can set the date/time.
>>>> Joshua Hursey
>>>> Postdoctoral Research Associate
>>>> Oak Ridge National Laboratory
>>> Joshua Hursey
>>> Postdoctoral Research Associate
>>> Oak Ridge National Laboratory
>> Joshua Hursey
>> Postdoctoral Research Associate
>> Oak Ridge National Laboratory
> Joshua Hursey
> Postdoctoral Research Associate
> Oak Ridge National Laboratory
Postdoctoral Research Associate
Oak Ridge National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the mpiwg-ft