[Mpi3-ft] Point2Point issue scenario with synchronous notification based on calling communicator only.
Thomas Herault
thomas.herault at lri.fr
Tue Feb 10 11:35:44 CST 2009
The failure happened in MPI_COMM_WORLD. I assume that for this case, the
fault tolerance strategy of library A is based on collective repair. Do you
propose to forbid using collective repairs if the user uses such
communication scheme?
How do you propose to deal with the following example:
MPI_COMM_WORLD is the communicator used in library A
MPI_COMM_2 is the communicator used in library B
rank 0: belongs to MPI_COMM_WORLD *and MPI_COMM_2*
-> in library A: do some computation and communications that succeed
-> in library A: computes a reversible checksum
-> in library A: continue computation, calling other libraries often
-> then, at some point in time:
-> in library A: MPI_Send(MPI_COMM_WORLD, dst=1);
-> crashes
-> would have entered library B and here do no communications at this
step
rank 1: belongs to MPI_COMM_WORLD and MPI_COMM_2
-> in library A: do some computation and communications that succeed
-> in library A: computes a reversible checksum
-> in library A: continue computation, calling other libraries often
-> then, at some point in time,
-> in library A: MPI_Recv(MPI_COMM_WORLD, src=0)
-> detects the failure
-> calls the error manager: collective repair
-> would have entered library B and called: MPI_Send(MPI_COMM_2,
dst=2);
rank 2: belongs to MPI_COMM_WORLD and MPI_COMM_2
-> in library A: do some computation and communications that succeed
-> in library A: computes a reversible checksum
-> in library A: continue computation, calling other libraries often
-> then, at some point in time,
-> in library A: does not have to communicate at this step
-> in library B: MPI_Recv(MPI_COMM_2, src=1);
-> will never succeed
To inverse the reversible checksum, the rank 2 needs to enter the collective
repair, in order to give to the new rank 0 the data that rank 0 lost with
the crash. It is not because the library uses a collective approach for
tolerating failures that it needs to do synchronizations all the time when
it calls other libraries: if library A highly depends on lower-level
libraries, it does not want to synchronize each time it is going to make a
call, but it is OK to synchronize for time to time to compute the reversible
checksum.
Thomas
2009/2/10 Erez Haba <erezh at microsoft.com>
> Don't' do collective repair in rank 1. Do a non-collective repair (rank 1
> does not require the participation of rank 2 to recover rank 0)
>
>
>
> *From:* mpi3-ft-bounces at lists.mpi-forum.org [mailto:
> mpi3-ft-bounces at lists.mpi-forum.org] *On Behalf Of *Thomas Herault
> *Sent:* Monday, February 09, 2009 4:47 PM
> *To:* MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
> *Subject:* [Mpi3-ft] Point2Point issue scenario with synchronous
> notification based on calling communicator only.
>
>
>
> Hi list,
>
>
>
> with the help of others, here is an adaptation of the "counter-example"
> based on p2p communications only.
>
>
>
> MPI_COMM_WORLD is the communicator used in library A
>
> MPI_COMM_2 is the communicator used in library B
>
>
>
> rank 0: belongs to MPI_COMM_WORLD only
>
> -> in library A: MPI_Send(MPI_COMM_WORLD, dst=1);
>
> -> crashes
>
>
>
> rank 1: belongs to MPI_COMM_WORLD and MPI_COMM_2
>
> -> in library A: MPI_Recv(MPI_COMM_WORLD, src=0)
>
> -> detects the failure
>
> -> calls the error manager: collective repair
>
> -> would have entere library B and called: MPI_Send(MPI_COMM_2, dst=2);
>
>
>
> rank 2: belongs to MPI_COMM_WORLD and MPI_COMM_2
>
> -> does nothing in library A except entering library B.
>
> -> in library B: MPI_Recv(MPI_COMM_2, src=1);
>
> -> will never succeed
>
>
>
> I understand from the discussion we had that a solution would be to
> validate COMM_WORLD for process 2 before entering library 2. I agree with
> that, but would like you to consider that it virtually means that we ask
> users to call a $n^2$ communications operation before any call of any
> function of any library (and possibly at the return of calls) if they want
> to use collective repairs. I would advocate studying a less
> performance-killer approach, where errors of any communicator would be
> notified in any MPI call.
>
>
>
> Bests,
>
> Thomas
>
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20090210/e5c71b5f/attachment-0001.html>
More information about the mpiwg-ft
mailing list