[Mpi3-ft] Ticket #292: MPI_COMM_KILL
George Bosilca
bosilca at icl.utk.edu
Mon May 13 22:09:00 CDT 2013
While I'm not supporting the functionality proposed on this ticket, I don't see how this ticket is a duplicate of MPI_COMM_REVOKE. This function allows any process to propagate the exclusion of any other process from the known universe, based on local information. Such behavior is difficult to implement in ULFM as it will require an agreement, thus a synchronizing operation. The only similarity I see with MPI_COMM_REVOKE is the way the kill message is propagated (operation with a global scope but without global call).
George.
On May 8, 2013, at 11:27 , Wesley Bland <wbland at mcs.anl.gov> wrote:
> I plan to post the result of any discussions we have onto the ticket, but I know that most of this list isn't on the ticket so we'd miss out on most of the FT people.
>
> On May 8, 2013, at 10:23 AM, Jim Dinan <dinan at mcs.anl.gov> wrote:
>
>> It might be helpful to have this discussion on the ticket itself (or copy/paste it there), so that it will be saved for future reference.
>>
>> ~Jim.
>>
>> On 5/8/13 10:21 AM, Richard Graham wrote:
>>> I believe this received the "cold shoulder", and should be closed.
>>>
>>> -----Original Message-----
>>> From: mpi3-ft-bounces at lists.mpi-forum.org [mailto:mpi3-ft-bounces at lists.mpi-forum.org] On Behalf Of Aurélien Bouteiller
>>> Sent: Wednesday, May 08, 2013 11:10 AM
>>> To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
>>> Subject: Re: [Mpi3-ft] Ticket #292: MPI_COMM_KILL
>>>
>>> This is a problem child. It changes the state of valid communicators by making operations operate on sparse communicators that contain holes (failed processes). This is difficult to implement and will have adverse performance hit, even outside failure cases. The functionality is duplicated with revoke/shrink, group_create, etc. I'm for killing this one, it has become irrelevant in the new context.
>>>
>>> Aurelien
>>>
>>> Le 7 mai 2013 à 17:19, Wesley Bland <wbland at mcs.anl.gov> a écrit :
>>>
>>>> author: jjhursey
>>>>
>>>> https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/292
>>>>
>>>> This proposal put forth a new function, MPI_COMM_KILL, that would exclude a remote rank from any further communication in all communicators in the MPI universe.
>>>>
>>>> One of the concerns of the forum is that it was starting to define semantics for failure scenarios other than fail-stop errors, specifically transient/intermittent failures. This was moved out of the previous RTS proposal because of this distinction. In the context of the current proposal, ULFM, this isn't really required anymore as the ULFM proposal states that once a process starts exhibiting failure, it is treated as fail-stop and should be excluded from further participation.
>>>> _______________________________________________
>>>> mpi3-ft mailing list
>>>> mpi3-ft at lists.mpi-forum.org
>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>>
>>> --
>>> * Dr. Aurélien Bouteiller
>>> * Researcher at Innovative Computing Laboratory
>>> * University of Tennessee
>>> * 1122 Volunteer Boulevard, suite 309b
>>> * Knoxville, TN 37996
>>> * 865 974 9375
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> mpi3-ft mailing list
>>> mpi3-ft at lists.mpi-forum.org
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>>
>>> _______________________________________________
>>> mpi3-ft mailing list
>>> mpi3-ft at lists.mpi-forum.org
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>>
>> _______________________________________________
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>
>
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
More information about the mpiwg-ft
mailing list