[mpiwg-persistence] progress rule vs. partitioned communication
Holmes, Daniel John
daniel.john.holmes at intel.com
Wed Jan 11 10:12:18 CST 2023
Hi Joachim,
> A naive implementation can simply wait for the entire message buffer
> to be marked ready before any transfer(s) occur and could wait until
> the completion function is called on a request before transferring
> data.
IMHO, the request mentioned here is intended to be the receive request, not the send request. That should be clarified to avoid the misunderstanding you highlight.
While Process 0 is blocked in Recv(1), it must guarantee progress (which includes progress for all enabled operations, which includes the partitioned send operation).
While Process 1 is blocked in Wait(r), it must guarantee progress (which includes progress for all enabled operations, which includes the partitioned receive operation).
Thus, Process 1 must return eventually and will reach Send(0), which guarantees that Process 0 will also finish eventually.
Best wishes,
Dan.
-----Original Message-----
From: mpiwg-persistence <mpiwg-persistence-bounces at lists.mpi-forum.org> On Behalf Of Joachim Jenke via mpiwg-persistence
Sent: 11 January 2023 14:05
To: mpiwg-persistence at lists.mpi-forum.org
Subject: [mpiwg-persistence] progress rule vs. partitioned communication
Hi all,
while working on deadlock detection for partitioned P2P communication we could not really come to a conclusion, whether a variant of the following non-deadlock scenario is a deadlock with partitioned
communication:
Process 0 Process 1
MPI_Isend(1,r) MPI_Recv(0)
MPI_Recv(1) MPI_Send(0)
MPI_Wait(r)
Transformed into partitioned communication:
Process 0 Process 1
MPI_Psend_init(1,r) MPI_Precv_init(0,r)
MPI_Start(r) MPI_Start(r)
MPI_Pready_range(all,r) MPI_Wait(r)
MPI_Recv(1) MPI_Send(0)
MPI_Wait(r)
The first example should not deadlock according to the process paragraph in MPI4,p75,l13pp:
> A call to MPI_WAIT that completes a receive will eventually terminate > and return if a matching send has been started , unless the send is > satisfied by another receive. In particular, if the matching send is > nonblocking, then the receive should complete even if no call is > executed by the sender to complete the send.
The second example might potentially deadlock according to the rational in MPI4,p106,l3pp:
> A naive implementation can simply wait for the entire message buffer > to be marked ready before any transfer(s) occur and could wait until > the completion function is called on a request before transferring > data.
If calling the completion function is necessary to eventually send the data, the second example will deadlock with Process 0 stalling in the MPI_Recv and Process 1 stalling in the MPI_Wait.
From my perspective this contradicts the process definition for MPI_WAIT completing a receive where the matching send has been started.
Another question might be when a partitioned send is started in the sense of the process rule for MPI_WAIT.
I checked the current mpi-4.x branch and the referenced wording seems to be still the same.
Did we miss something?
Best
Joachim
--
Dr. rer. nat. Joachim Jenke
IT Center
Group: High Performance Computing
Division: Computational Science and Engineering RWTH Aachen University Seffenter Weg 23 D 52074 Aachen (Germany)
Tel: +49 241 80- 24765
Fax: +49 241 80-624765
jenke at itc.rwth-aachen.de
www.itc.rwth-aachen.de
More information about the mpiwg-persistence
mailing list