[Mpi3-ft] MPI_Comm_validate_all

Solt, David George david.solt at hp.com
Wed Feb 16 15:43:12 CST 2011


In our implementation, we can't guarantee consensus (since it is not possible), but the successful ranks are aware of which ranks may have reached an incompatible conclusion and so they proactively "break" the virtual connections to those processes so that when they known failed ranks attempt to use their wrong communicator, they will get failures and not hang.   I couldn't come up with a way however to ensure that all ranks have perfect consensus in the presence of arbitrary failures.   

For example, it is possible that 0,1,2,3 call regroup.  Due to a late failure during the algorithm, 0 thinks the group is {0,1}, 1 thinks the group is {0,1} and 2 thinks the group is {1,2} and 3 thinks it is not part of the new group.   In this case, rank 0-1 will close the virtual connections to ranks 2 and 3, so rank 2 will not hang when it tries to use its invalid group.  

Our assumption is that once a rank is excluded from the group, it cannot be part of the comm ever again.  (It could use connect/accept/comm_merge to join the other processes using a new communicator, but it cannot attempt to regroup the original communicator again).  

I agree that a return code of "the regroup had problems, please try again" makes no sense and cannot be useful.

Dave  

-----Original Message-----
From: mpi3-ft-bounces at lists.mpi-forum.org [mailto:mpi3-ft-bounces at lists.mpi-forum.org] On Behalf Of Thomas Herault
Sent: Wednesday, February 16, 2011 3:35 PM
To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
Subject: Re: [Mpi3-ft] MPI_Comm_validate_all

If we allow the call to return successfully at some nodes, and an error at others, we defeat the reason of existence of this call.

If some of them detect the failure, and others don't, some will enter the call (let say A detected the failure, and entered validate again, to acknowledge it), others (B) will enter other communications, e.g. mpi_recv(A), which will never return an error, because communication with A is legitimate, but A is not doing the send, it's trying to revalidate the communicator, which it cannot, because B does not enter the call. The MPI application is erroneous, but could not have been correct: consensus semantics on at least one collective operation is required to allow for a collective repair.

Thomas

Le 16 févr. 2011 à 16:24, Bronevetsky, Greg a écrit :

> Actually, I think Darius has a point. The exact guarantee in impossible in the general case because its reducible to the consensus problem. Unfortunately, the spec has to assume the general case, while databases don't need to and can assume synchronous communication or bounds on message delivery times. I think it'll be safer to use Darius' suggestion: guaranteed to return the same thing on processes where it does return something.
> 
> Greg Bronevetsky
> Lawrence Livermore National Lab
> (925) 424-5756
> bronevetsky at llnl.gov
> http://greg.bronevetsky.com 
> 
> 
>> -----Original Message-----
>> From: mpi3-ft-bounces at lists.mpi-forum.org [mailto:mpi3-ft-
>> bounces at lists.mpi-forum.org] On Behalf Of Joshua Hursey
>> Sent: Wednesday, February 16, 2011 1:17 PM
>> To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
>> Subject: Re: [Mpi3-ft] MPI_Comm_validate_all
>> 
>> It is a challenging guarantee to provide, but possible. Databases need to
>> make decisions like this all time with transactions (commit=success, or
>> abort=failure). Though database transaction protocols are a good place to
>> start, we can likely loosen some of the restrictions since we are applying
>> them to a slightly different environment.
>> 
>> Look at a two-phase commit protocol that includes a termination protocol
>> (Grey), or a three-phase commit protocol (Skeen). The trick is that you
>> really want what the literature calls a 'nonblocking' commit protocol,
>> meaning that it will not block in an undecided state waiting for the
>> recovery of a peer process that might be able to decide from a recovery
>> log. There are a few other more scalable approaches out there, but are
>> challenging to implement correctly.
>> 
>> -- Josh
>> 
>> Gray: Notes on Data Base Operating Systems (note this describes a protocol
>> without the termination protocol, but a databases text should be able to
>> fill in that part) - 1979
>> 
>> Skeen: Nonblocking commit protocols - 1981
>> 
>> On Feb 16, 2011, at 3:49 PM, Darius Buntinas wrote:
>> 
>>> 
>>> MPI_Comm_validate_all, according to the proposal at [1], must "either
>> complete successfully everywhere or return some error everywhere."  Is this
>> possible to guarantee?  What about process failures during the call?
>> Consider the last message sent in the protocol.  If the process sending
>> that message dies just before sending it, the receiver will not know
>> whether to return success or failure.
>>> 
>>> I think that the best we can do is say that the outcount and list of
>> collectively-detected dead processes will be the same at all processes
>> where the call completed successfully.
>>> 
>>> Or is there a trick I'm missing?
>>> 
>>> Thanks,
>>> -d
>>> 
>>> [1] https://svn.mpi-forum.org/trac/mpi-forum-
>> web/wiki/ft/run_through_stabilization#CollectiveValidationOperations
>>> _______________________________________________
>>> mpi3-ft mailing list
>>> mpi3-ft at lists.mpi-forum.org
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>> 
>> 
>> ------------------------------------
>> Joshua Hursey
>> Postdoctoral Research Associate
>> Oak Ridge National Laboratory
>> http://users.nccs.gov/~jjhursey
>> 
>> 
>> _______________________________________________
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
> 
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft


_______________________________________________
mpi3-ft mailing list
mpi3-ft at lists.mpi-forum.org
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft




More information about the mpiwg-ft mailing list