[Mpi3-ft] Point2Point issue scenario with synchronous notification based on calling communicator only.
Thomas Herault
thomas.herault at lri.fr
Mon Feb 9 18:47:12 CST 2009
Hi list,
with the help of others, here is an adaptation of the "counter-example"
based on p2p communications only.
MPI_COMM_WORLD is the communicator used in library A
MPI_COMM_2 is the communicator used in library B
rank 0: belongs to MPI_COMM_WORLD only
-> in library A: MPI_Send(MPI_COMM_WORLD, dst=1);
-> crashes
rank 1: belongs to MPI_COMM_WORLD and MPI_COMM_2
-> in library A: MPI_Recv(MPI_COMM_WORLD, src=0)
-> detects the failure
-> calls the error manager: collective repair
-> would have entere library B and called: MPI_Send(MPI_COMM_2, dst=2);
rank 2: belongs to MPI_COMM_WORLD and MPI_COMM_2
-> does nothing in library A except entering library B.
-> in library B: MPI_Recv(MPI_COMM_2, src=1);
-> will never succeed
I understand from the discussion we had that a solution would be to validate
COMM_WORLD for process 2 before entering library 2. I agree with that, but
would like you to consider that it virtually means that we ask users to call
a $n^2$ communications operation before any call of any function of any
library (and possibly at the return of calls) if they want to use collective
repairs. I would advocate studying a less performance-killer approach, where
errors of any communicator would be notified in any MPI call.
Bests,
Thomas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20090209/c802a69f/attachment.html>
More information about the mpiwg-ft
mailing list