[Mpi3-ft] fault-tolerant collectives

Sun Sep 11 16:45:22 CDT 2011

On Fri, Sep 9, 2011 at 7:57 PM, Graham, Richard L. <rlgraham at ornl.gov> wrote:
>  I have been talking a reasonable amount with apps folks lately about this proposal, and they first response is often one of shock, as it is not quite what folks initially expect.  However, once one explains the background for the proposal, people tend to accept the notions.

Can you explain a bit more about what they were shocked by? Was it the
general notion of application involved FT, or the interface not being
what they expected/needed?

>  I agree that we need to define a mechanism for specifying return codes - uniform among surviving ranks, or locally determined types.  However, I do believe that we need to add the second set of collectives into 3.0.  We have mentioned this as an option for several years (actually since the inception of the group almost 4 years ago), but as a working group never did something explicit about this.  There is a reasonable number of apps folks that expect this type of collective communications.

It shouldn't be to difficult to specify/add, and a prototype
implementation would be trivial though maybe inefficient at first. Is
this something that we should put in the Stabilization proposal or
bring in as a separate ticket directly afterward?

Separating the two keeps the initial proposal simpler and users can
get this functionality by wrapping existing collectives in
comm_validate calls (not efficient, but functional). Keeping them
together allows us to address a know interface optimization that
applications want in the first pass.

>  One other thing that came up yesterday (I have given 2 talks about the FT stuff in Kobe this week) is that it would be good to be able to specify multiple communicators to mpi_comm_validate(), especially, since a common motif is to dup an existing communicator to isolate communication.  This is really the only way that I can think of to avoid un-needed global communication, if more than one communicator is of interest to the app.

I've had a couple applications ask about this as well.

The group has talked about such an interface a few times now, and keep
getting stuck on specifying the interface and semantics of such an
operation. Did they want a function that would take an array of
communicators to validate, or have the validation of one communicator
be inherited by all of the derived communicators?

I think the array of communicators interface seems like the easiest to
use, and makes it easier to protect libraries. But that leads us to
the question, do all processes (union of processes from all
communicators specified?) have to supply the same set of
communicators? If not, do we run the risk of a circular dependency
causing the call to deadlock? We might be able to pass the
responsibility to avoid such problems off to the user.

I'm game for trying to specify this again. The group decided to push
this off to a follow on ticket because it can be achieved (though
inefficiently) by making a call to comm_validate for each of the
communicators, and we had trouble specifying it correctly. So the
question again is should be keep it as a separate ticket or add it to
the stabilization proposal?

Thoughts?

-- Josh

>
> Rich
>
> -----Original Message-----
> From: mpi3-ft-bounces at lists.mpi-forum.org [mailto:mpi3-ft-bounces at lists.mpi-forum.org] On Behalf Of Darius Buntinas
> Sent: Friday, September 09, 2011 4:39 PM
> To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
> Subject: Re: [Mpi3-ft] fault-tolerant collectives
>
>
> OK, that makes sense.  I'll fix up that text.
>
> Thanks,
> -d
>
> On Sep 9, 2011, at 3:36 PM, Josh Hursey wrote:
>
>> I think that we want to say that an implementation may provide uniform
>> return codes from collectives, but are not required to do so. So this
>> makes then fault tolerant-ish - in the sense that they have to work
>> around failure to return error codes consistently, but not that they
>> finish the collective successfully even if new process failures emerge
>> during the collectives (that would undermine the semantic protections
>> we are putting in place).
>>
>> We should probably not say 'fault tolerant collectives' in the current
>> proposal so we don't confuse things. Maybe 'collectives that provide
>> uniform return codes'?
>>
>>
>> If we want truly fault tolerant collectives (like those described
>> below), then I think we should introduce a different set of functions.
>> The functions should probably return a group of processes that either
>> did or did not participate in creating the final result. Something
>> like:
>>  MPI_Reduce_ft(..., &group);
>>
>> I think the true fault tolerant collectives should be left to a follow
>> on ticket since there is a need, but can be easily added as a second
>> step.
>>
>> -- Josh
>>
>> On Fri, Sep 9, 2011 at 4:17 PM, Darius Buntinas <buntinas at mcs.anl.gov> wrote:
>>>
>>> We discussed the option of allowing an implementation to provide fault tolerant (not just fault-aware) collectives.  The idea is that even when a process fails, collectives will continue to operate correctly (modulo the failed process).
>>>
>>> Does this imply that the communicator will never become collectively inactive?
>>>
>>> If no, then what's the point of ft collectives?
>>>
>>> If yes, then the application may never get notification that a process has failed and collectives are now running one short.  Is this what we really want?
>>>
>>> -d
>>> _______________________________________________
>>> mpi3-ft mailing list
>>> mpi3-ft at lists.mpi-forum.org
>>> hxxp://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>>
>>>
>>
>>
>>
>> --
>> Joshua Hursey
>> Postdoctoral Research Associate
>> Oak Ridge National Laboratory
>> hxxp://users.nccs.gov/~jjhursey
>>
>> _______________________________________________
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
>> hxxp://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>
>
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> hxxp://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>
>
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>
>

-- 
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey