[Mpi3-ft] radical idea?
Howard Pritchard
howardp at cray.com
Thu Jul 21 12:06:45 CDT 2011
Hi Darius,
If we want to get something about RTS into MPI 3.0 I don't
think we have time to manage it as a set of smaller proposals.
If we can eliminate the state problem that bothered some at
the last forum meeting that would be a good start. Also,
if we could simplify the proposal some by removing
the PROC_NULL semantics, I would be in favor of that.
If we want to limit the use of RTS to a small number of use
cases (like the NOAA example), then I could see deferring
"repairable" communicators to 3.1.
Howard
Darius Buntinas wrote:
> We could break the rts proposal into smaller ones:
>
> point-to-point: local up/down checks; errors on sending to failed processes
> recognition/PROC_NULLification: Add a function to set a rank in a communicator to
> MPI_PROC_NULL
> fault-aware collectives: collectives don't hang, but they're permanently broken once a
> proc in the communicator fails
> "repairable" collectives: validate_all; collectives can be reactivated after failure
>
> I don't think anyone really objected to "point-to-point" or "fault-aware collectives". We'll have to work on the others.
>
> -d
>
>
> On Jul 20, 2011, at 9:13 AM, Joshua Hursey wrote:
>
>> I'll have to think a bit more and come back to this thread. But I wanted to interject something I was thinking about on the plane ride back. What if we removed the notion of recognized failures?
>>
>> This was a point that was mentioned a couple times in discussion - that we have a bunch of functions and extra state on each communicator because we want to allow the application to recognize failures to get PROC_NULL semantics. If we remove the notion of recognized failures, then the up/down state on the group would be enough to track. So failed processes will always return an error regardless of if the failure has been 'seen' by the application before or not.
>>
>> The state of a process would be able to change as MPI finds out about new failures. But we can provide a 'state snapshot' object (which was mentioned in discussion, and I think is what Darius is getting at below) to allow for more consistent lookups if the application so desires. This removes the local/global list tracking on each handle, and moves it to a separate object that the user is in control of. The user can still reference the best known state if they are not concerned about consistency (e.g., MPI_Comm_validate_get_state(comm, ...) vs MPI_Snapshot_validate_get_state(snapshot_handle, ...)).
>>
>> Some applications would like the PROC_NULL semantics. But if we can convince ourselves that a library on top of MPI could provide those (by adding the proc_null check in the PMPI layer), then we might be able to reduce the complexity of the proposal by pushing some of the state tracking responsibility above MPI.
>>
>> I still have not figured out the implications on an application using collective operations if we remove the NULL state, but it is something to think about.
>>
>> -- Josh
>>
>> On Jul 19, 2011, at 6:11 PM, Darius Buntinas wrote:
>>
>>> The MPI_COMM_NULLIFY() function would effectively set the process to MPI_PROC_STATE_NULL.
>>>
>>> In the proposal we had MPI_PROC_STATE_NULL, _FAILED and _OK. I'm proposing separating NULL from FAILED and OK. So the MPI_COMM_GET_STATE() function (and friends) would let you query the (locally known) FAILED/OK state of the process, while MPI_COMM_NULLIFY() (and friends) would let you set the process to NULL. There would be essentially two state variables associated with each process: one indicating whether it's failed or not (let's call it LIVENESS), and the other whether it's has PROC_NULL semantics (call it NULLIFICATION). The LIVENESS state is controlled by the MPI library while the NULLIFICATION state is controlled by the user. The table below shows how these states would match up with the current proposal:
>>>
>>> Current proposal state LIVENESS NULLIFICATION
>>> -----------------------+----------+---------------
>>> MPI_PROC_STATE_OK OK NORMAL
>>> MPI_PROC_STATE_FAILED FAILED NORMAL
>>> MPI_PROC_STATE_NULL FAILED NULL
>>> <UNDEFINED> OK NULL
>>>
>>> Notice that there's a combination possible that's not covered by the current proposal. I'm not sure whether that's a useful state (or if we should disallow it).
>>>
>>> We'd could add a function to set the NULLIFICATION state from NULL to NORMAL for completeness.
>>>
>>> -d
>>>
>>>
>>> On Jul 19, 2011, at 4:32 PM, Solt, David George wrote:
>>>
>>>> This works for "reading" state, but has no way to set a processes state. (Not sure how radical your trying to go here... is part of the change here that there would no longer be a MPI_PROC_STATE_NULL state?)
>>>> Dave
>>>>
>>>> -----Original Message-----
>>>> From: mpi3-ft-bounces at lists.mpi-forum.org [mailto:mpi3-ft-bounces at lists.mpi-forum.org] On Behalf Of Darius Buntinas
>>>> Sent: Tuesday, July 19, 2011 3:17 PM
>>>> To: Darius Buntinas
>>>> Cc: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
>>>> Subject: Re: [Mpi3-ft] radical idea?
>>>>
>>>>
>>>> Howard pointed out that I forgot to add a FREE operation:
>>>>
>>>> MPI_PROC_STATE_FREE(state_handle)
>>>> INOUT: MPI_PROC_STATE state_handle
>>>>
>>>> -d
>>>>
>>>> On Jul 19, 2011, at 3:07 PM, Darius Buntinas wrote:
>>>>
>>>>> MPI_COMM_GET_STATE(comm, state_handle)
>>>>> IN: MPI_COMM comm
>>>>> OUT: MPI_PROC_STATE state_handle
>>>>> and ditto for GROUP, FILE, WIN as necessary
>>>>>
>>>>> MPI_GET_PROC_STATE_SIZE(state_handle, mask, size)
>>>>> IN: MPI_PROC_STATE state_handle
>>>>> IN: int mask
>>>>> OUT: int size
>>>>>
>>>>> MPI_GET_PROC_STATE_LIST(state_handle, mask, list)
>>>>> IN: MPI_PROC_STATE state_handle
>>>>> IN: int mask
>>>>> OUT: int list[]
>>>>>
>>>>> MPI_GET_PROC_STATE_NEW(state_handle1, state_handle2, state_handle_new)
>>>>> IN: MPI_PROC_STATE state_handle1
>>>>> IN: MPI_PROC_STATE state_handle2
>>>>> OUT: MPI_PROC_STATE state_handle_new
>>>>> This gives newly failed processes in state_handle2 since state_handle1.
>>>> _______________________________________________
>>>> mpi3-ft mailing list
>>>> mpi3-ft at lists.mpi-forum.org
>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>>>
>>>> _______________________________________________
>>>> mpi3-ft mailing list
>>>> mpi3-ft at lists.mpi-forum.org
>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>>
>>> _______________________________________________
>>> mpi3-ft mailing list
>>> mpi3-ft at lists.mpi-forum.org
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>>
>>
>> _______________________________________________
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>
>
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
--
Howard Pritchard
Software Engineering
Cray, Inc.
More information about the mpiwg-ft
mailing list