[Mpi3-ft] [Mpi3-rma] Joint RMA/FT Teleconf

Josh Hursey jjhursey at open-mpi.org
Thu Apr 26 09:07:18 CDT 2012

Some responses inline below.

On Wed, Apr 25, 2012 at 6:05 PM, Jim Dinan <dinan at mcs.anl.gov> wrote:
> Hi FT Folks,
> I was reading on in the proposal and had a few questions/comments:
> Is MPI_Win_invalidate collective or local?  This isn't clear from the text.

Win_invalidate is a local operation with remote implications. So only
one process needs to call it. In the Comm_invalidate definition we say
it is "not collective". We'll work on tidying up that language. I
think Brian mentioned this yesterday as well.

> Is the chapter name "Process Fault Tolerance" an improvement over just
> "Fault Tolerance."  The former seems unnecessarily limiting.

The chapter is currently only concerned with process fault tolerance,
so I think that is where the title change came from. However, I see
your point about being restrictive if we decide to account for other
types of faults in the future. FT group what do you think about this?

> p.545, Advice #1: Ignoring that an advice to users is non-normative,
> does this users mean that we /have to/ propagate failure information in
> synchronization operations that don't target failed processes?  If this
> is not the case, how can I find out about failed processes?  Is there a
> validate or agreement function that could be used to force an update?
> If so, that would be better to mention here.
> I think I also would prefer to end this advice with "with communication
> that targets a failed process."

I believe the intention was to provide guidance to the implementor,
and not a requirement though the wording does seem to express the
requirement of failure propagation during synchronization.
Additionally, I believe the intention was to keep the restriction that
the synchronization operation is restricted to just those processes
targeted, so your additional clarification seems appropriate.

UTK folks do you want to comment on this?

> Thanks for the interesting discussion,
>  ~Jim.

Thanks to everyone that participated in the discussion, it was quite
helpful. The FT working group has a number of notes from the meeting,
and will be working on some revised text to circulate in the next few
weeks. After which we will likely want another teleconf to keep
refining the RMA/FT ticket.

-- Josh

> On 04/25/2012 07:48 AM, Josh Hursey wrote:
>> Just a reminder that we have a teleconf this afternoon.
>> -- Josh
>> On Thu, Apr 19, 2012 at 4:15 PM, Josh Hursey <jjhursey at open-mpi.org> wrote:
>>> It looks like April 25 from 3-4 pm Eastern works best for everyone
>>> that responded. Below are the teleconf details:
>>>  Date: April 25, 2012
>>>  Time: 3 pm Eastern/New York
>>>  Dial-in information: 877-801-8130
>>>  Code: 1044056
>>> -- Josh
>>> On Tue, Apr 17, 2012 at 10:45 AM, Josh Hursey <jjhursey at open-mpi.org> wrote:
>>>> The FT working group would like to have a joint teleconf with the RMA
>>>> working group to discuss the fault tolerance proposal.
>>>>  - https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/323
>>>>  - https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/325
>>>> A Doodle Poll link is below to schedule:
>>>>  - http://www.doodle.com/gv8k9y223abhabrz
>>>> If you are interested in participating in this teleconf, fill out the
>>>> poll by 2 pm Eastern on Thursday, April 19.
>>>> Thanks,
>>>> Josh
>>>> --
>>>> Joshua Hursey
>>>> Postdoctoral Research Associate
>>>> Oak Ridge National Laboratory
>>>> http://users.nccs.gov/~jjhursey
>>> --
>>> Joshua Hursey
>>> Postdoctoral Research Associate
>>> Oak Ridge National Laboratory
>>> http://users.nccs.gov/~jjhursey
> _______________________________________________
> mpi3-rma mailing list
> mpi3-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma

Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory

More information about the mpiwg-ft mailing list