herault.thomas at gmail.com
Wed Feb 16 15:55:55 CST 2011
Le 16 févr. 2011 à 16:45, Graham, Richard L. a écrit :
> You can't guarantee all will return, but you can guarantee that those who
> do, will return the same value. So you will get the status just before
> the call - which is the intent of this call.
To be clear: (all living processes return the same error) XOR (all living processes return SUCCESS and the lists are the same)
-> Is it what this paragraph intended to say?
-> The processes that don't return: are they dead? Or blocked in this call forever (hence, as useful as a dead processor)?
I'm saying that
- it is possible to implement these semantics, assuming a failure detection mechanism.
- it is necessary to have these semantics to have a collective repair, and allow the application to use blocking calls, like MPI_recv, after such a validation.
> On 2/16/11 3:49 PM, "Darius Buntinas" <buntinas at mcs.anl.gov> wrote:
>> MPI_Comm_validate_all, according to the proposal at , must "either
>> complete successfully everywhere or return some error everywhere." Is
>> this possible to guarantee? What about process failures during the call?
>> Consider the last message sent in the protocol. If the process sending
>> that message dies just before sending it, the receiver will not know
>> whether to return success or failure.
>> I think that the best we can do is say that the outcount and list of
>> collectively-detected dead processes will be the same at all processes
>> where the call completed successfully.
>> Or is there a trick I'm missing?
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
More information about the mpiwg-ft