[Mpi-forum] [EXT]: Progress Question
Martin Schulz
schulzm at in.tum.de
Mon Oct 12 04:04:43 CDT 2020
Hi Jim, all,
We had a similar discussion (in a smaller circle) during the terms discussions – at least to my understanding, all bets are off as soon as you add dependencies and wait conditions outside of MPI, like here with the file. A note to this point is in a rational (Section 11.7, page 491 in the 2019 draft) – based on that an MPI implementation is allowed to deadlock (or cause a deadlock) – if all dependencies would be in MPI calls, then “eventual” progress should be guaranteed – even if it is after the 100 days in Rajeev’s example: that would – as far as I understand – still be correct behavior, as no MPI call is guaranteed to return in a fixed finite time (all calls are at best “weak local”).
Martin
--
Prof. Dr. Martin Schulz, Chair of Computer Architecture and Parallel Systems
Department of Informatics, TU-Munich, Boltzmannstraße 3, D-85748 Garching
Member of the Board of Directors at the Leibniz Supercomputing Centre (LRZ)
Email: schulzm at in.tum.de
From: mpi-forum <mpi-forum-bounces at lists.mpi-forum.org> on behalf of Jim Dinan via mpi-forum <mpi-forum at lists.mpi-forum.org>
Reply-To: Main MPI Forum mailing list <mpi-forum at lists.mpi-forum.org>
Date: Sunday, 11. October 2020 at 23:41
To: "Skjellum, Anthony" <Tony-Skjellum at utc.edu>
Cc: Jim Dinan <james.dinan at gmail.com>, Main MPI Forum mailing list <mpi-forum at lists.mpi-forum.org>
Subject: Re: [Mpi-forum] [EXT]: Progress Question
You can have a situation where the isend/irecv pair completes at process 0 before process 1 has called irecv or waitall. Since process 0 is now busy waiting on the file, it will not make progress on MPI calls and can result in deadlock.
~Jim.
On Sat, Oct 10, 2020 at 2:17 PM Skjellum, Anthony <Tony-Skjellum at utc.edu> wrote:
Jim, OK, my attempt at answering below.
See if you agree with my annotations.
-Tony
Anthony Skjellum, PhD
Professor of Computer Science and Chair of Excellence
Director, SimCenter
University of Tennessee at Chattanooga (UTC)
tony-skjellum at utc.edu [or skjellum at gmail.com]
cell: 205-807-4968
From: mpi-forum <mpi-forum-bounces at lists.mpi-forum.org> on behalf of Jim Dinan via mpi-forum <mpi-forum at lists.mpi-forum.org>
Sent: Saturday, October 10, 2020 1:31 PM
To: Main MPI Forum mailing list <mpi-forum at lists.mpi-forum.org>
Cc: Jim Dinan <james.dinan at gmail.com>
Subject: [EXT]: [Mpi-forum] Progress Question
External Email
Hi All,
A colleague recently asked a question that I wasn't able to answer definitively. Is the following code guaranteed to make progress?
MPI_Barrier();
-- everything is uncertain to within one message, if layered on pt2pt;
--- let's assume a power of 2, and recursive doubling (RD).
--- At each stage, it posts an irecv and isend to its corresponding element in RD
--- All stages must complete to get to the last stage.
--- At the last stage, it appears like your example below for N/2 independent process pairs, which appears always to complete.
Oif rank == 1
create_file("test")
if rank == 0
while not_exists("test")
sleep(1);
That is, can rank 1 require rank 0 to make MPI calls after its return from the barrier, in order for rank 1 to complete the barrier? If the code were written as follows:
isend(..., other_rank, &req[0])
irecv(..., other_rank, &req[1])
waitall(2, req)
--- Assume both isends buffer on the send-side and return immediately--valid.
--- Both irecvs are posted, but unmatched as yet. Nothing has transferred on network.
--- Waitall would mark the isends done at once, and work to complete the irecvs; in
that process, each would have to progress the isends across the network. On this comm
and all comms, incidentally.
--- When waitall returns, the data has transferred to the receiver, otherwise the irecvs
aren't done.
if rank == 1
create_file("test")
if rank == 0
while not_exists("test")
sleep(1);
I think it would clearly not guarantee progress since the send data can be buffered. Is the same true for barrier?
Cheers,
~Jim.
This message is not from a UTC.EDU address. Caution should be used in clicking links and downloading attachments from unknown senders or unexpected email.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpi-forum/attachments/20201012/a64b8dae/attachment-0001.html>
More information about the mpi-forum
mailing list