[Mpi3-ft] Fault Tolerance & RMA Discussion

Josh Hursey jjhursey at open-mpi.org
Mon Feb 6 13:45:05 CST 2012


We are going to meet from 10-11 am (Eastern) on Feb. 7 to continue our
conversation. We will use the same call-in information as before.

Thanks,
Josh

On Thu, Feb 2, 2012 at 3:00 PM, Josh Hursey <jjhursey at open-mpi.org> wrote:

> We made some really good progress on today's call. Attached are some notes
> that I took from the call.
>
> At the end of the call there were a couple of items that we wanted to get
> a finer understanding of. As a result we are going to try to setup another
> teleconf.
>
> Below is a doodle poll to pick a date/time:
>    http://www.doodle.com/kzmiknie8yz4wxkc
>
> If you are interested in attending this teleconf, please fill out the poll
> by 2 pm Eastern on Monday, Feb. 6.
>
> Thanks,
> Josh
>
>
> On Thu, Feb 2, 2012 at 10:01 AM, Josh Hursey <jjhursey at open-mpi.org>wrote:
>
>> Just a reminder that we are meeting today at Noon Eastern to discuss RMA
>> in the context of the fault tolerance proposal.
>>
>> The Run-Through Stabilization proposal can be found attached to the
>> ticket:
>>   https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/276
>>
>> https://svn.mpi-forum.org/trac/mpi-forum-web/attachment/ticket/276/FTWG-Process-FT-Draft-2011-12-20.pdf
>>
>> We will be focusing on section 17.11 of that document. Note that this
>> section does not currently explicitly account for the new RMA proposal, but
>> we would like to remedy that for the next reading.
>>
>> Thanks,
>> Josh
>>
>> On Wed, Jan 25, 2012 at 3:15 PM, Josh Hursey <jjhursey at open-mpi.org>wrote:
>>
>>> There was no one date/time that worked for everyone, but I chose a time
>>> that worked for most of the respondents. We will meet Thursday, Feb. 2 from
>>> 12-1 pm EST/New York to discuss this topic.
>>>
>>> We can use the following teleconf information:
>>>   US Toll Free number: 877-801-8130
>>>   Toll number: 1-203-692-8690
>>>   Access Code: 1044056
>>>
>>> Thanks,
>>> Josh
>>>
>>>
>>> On Mon, Jan 23, 2012 at 4:33 PM, Josh Hursey <jjhursey at open-mpi.org>wrote:
>>>
>>>> (Cross posted to both the RMA and FT MPI-3 listservs)
>>>>
>>>> During the FT plenary session at the Jan. MPI Forum meeting it was
>>>> recommended that some of the members of the FT group and the RMA group have
>>>> a meeting to hash out the precise details of the FT semantics for the RMA
>>>> chapter. So I would like to facilitate such a discussion, preferability in
>>>> the next week (so we have time to fine tune things before the next forum
>>>> meeting).
>>>>
>>>> In general, we are trying to answer the question "How should RMA
>>>> operations behave when a process failure occurs?" The feeling seemed to be
>>>> that the current approach is ok (invalidating the window, forcing
>>>> recreation/validation), but the statement that the memory exposed in the
>>>> window is 'undefined' seemed excessive. The suggestion was to change the
>>>> wording to something like "Only the memory associated with a window that
>>>> was targeted by an operation that modified it is undefined after process
>>>> failure in the group associated with the window." This lead to a
>>>> considerable amount of debate in the meeting, so it was suggested that we
>>>> take the discussion offline.
>>>>
>>>> Below is a link to a doodle poll to find a good time for a teleconf. If
>>>> you are interested in participating in this discussion, please fill this
>>>> poll out by 2 PM Eastern on Wed. Jan 25 so we can set the date/time.
>>>>    http://www.doodle.com/vd33va5h8iankega
>>>>
>>>> Thanks,
>>>> Josh
>>>>
>>>> --
>>>> Joshua Hursey
>>>> Postdoctoral Research Associate
>>>> Oak Ridge National Laboratory
>>>> http://users.nccs.gov/~jjhursey
>>>>
>>>
>>>
>>>
>>> --
>>> Joshua Hursey
>>> Postdoctoral Research Associate
>>> Oak Ridge National Laboratory
>>> http://users.nccs.gov/~jjhursey
>>>
>>
>>
>>
>> --
>> Joshua Hursey
>> Postdoctoral Research Associate
>> Oak Ridge National Laboratory
>> http://users.nccs.gov/~jjhursey
>>
>
>
>
> --
> Joshua Hursey
> Postdoctoral Research Associate
> Oak Ridge National Laboratory
> http://users.nccs.gov/~jjhursey
>



-- 
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20120206/2c4aecb3/attachment-0001.html>


More information about the mpiwg-ft mailing list