[Mpi3-ft] Ticket #292: MPI_COMM_KILL
Wesley Bland
wbland at mcs.anl.gov
Wed May 8 10:27:28 CDT 2013
I plan to post the result of any discussions we have onto the ticket, but I know that most of this list isn't on the ticket so we'd miss out on most of the FT people.
On May 8, 2013, at 10:23 AM, Jim Dinan <dinan at mcs.anl.gov> wrote:
> It might be helpful to have this discussion on the ticket itself (or copy/paste it there), so that it will be saved for future reference.
>
> ~Jim.
>
> On 5/8/13 10:21 AM, Richard Graham wrote:
>> I believe this received the "cold shoulder", and should be closed.
>>
>> -----Original Message-----
>> From: mpi3-ft-bounces at lists.mpi-forum.org [mailto:mpi3-ft-bounces at lists.mpi-forum.org] On Behalf Of Aurélien Bouteiller
>> Sent: Wednesday, May 08, 2013 11:10 AM
>> To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
>> Subject: Re: [Mpi3-ft] Ticket #292: MPI_COMM_KILL
>>
>> This is a problem child. It changes the state of valid communicators by making operations operate on sparse communicators that contain holes (failed processes). This is difficult to implement and will have adverse performance hit, even outside failure cases. The functionality is duplicated with revoke/shrink, group_create, etc. I'm for killing this one, it has become irrelevant in the new context.
>>
>> Aurelien
>>
>> Le 7 mai 2013 à 17:19, Wesley Bland <wbland at mcs.anl.gov> a écrit :
>>
>>> author: jjhursey
>>>
>>> https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/292
>>>
>>> This proposal put forth a new function, MPI_COMM_KILL, that would exclude a remote rank from any further communication in all communicators in the MPI universe.
>>>
>>> One of the concerns of the forum is that it was starting to define semantics for failure scenarios other than fail-stop errors, specifically transient/intermittent failures. This was moved out of the previous RTS proposal because of this distinction. In the context of the current proposal, ULFM, this isn't really required anymore as the ULFM proposal states that once a process starts exhibiting failure, it is treated as fail-stop and should be excluded from further participation.
>>> _______________________________________________
>>> mpi3-ft mailing list
>>> mpi3-ft at lists.mpi-forum.org
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>
>> --
>> * Dr. Aurélien Bouteiller
>> * Researcher at Innovative Computing Laboratory
>> * University of Tennessee
>> * 1122 Volunteer Boulevard, suite 309b
>> * Knoxville, TN 37996
>> * 865 974 9375
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>
>> _______________________________________________
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
More information about the mpiwg-ft
mailing list