[Mpi3-ft] Point2Point issue scenario with synchronous notification based on calling communicator only.

Mon Feb 9 18:47:12 CST 2009

Hi list,

with the help of others, here is an adaptation of the "counter-example"
based on p2p communications only.

MPI_COMM_WORLD is the communicator used in library A
MPI_COMM_2 is the communicator used in library B

rank 0: belongs to MPI_COMM_WORLD only
  -> in library A: MPI_Send(MPI_COMM_WORLD, dst=1);
   -> crashes

rank 1: belongs to MPI_COMM_WORLD and MPI_COMM_2
  -> in library A: MPI_Recv(MPI_COMM_WORLD, src=0)
   -> detects the failure
   -> calls the error manager: collective repair
  -> would have entere library B and called: MPI_Send(MPI_COMM_2, dst=2);

rank 2: belongs to MPI_COMM_WORLD and MPI_COMM_2
  -> does nothing in library A except entering library B.
  -> in library B: MPI_Recv(MPI_COMM_2, src=1);
    -> will never succeed

I understand from the discussion we had that a solution would be to validate
COMM_WORLD for process 2 before entering library 2. I agree with that, but
would like you to consider that it virtually means that we ask users to call
a $n^2$ communications operation before any call of any function of any
library (and possibly at the return of calls) if they want to use collective
repairs. I would advocate studying a less performance-killer approach, where
errors of any communicator would be notified in any MPI call.

Bests,
Thomas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20090209/c802a69f/attachment.html>