[Mpi-forum] [EXT]: Progress Question

Martin Schulz schulzm at in.tum.de
Mon Oct 12 04:04:43 CDT 2020


Hi Jim, all,

 

We had a similar discussion (in a smaller circle) during the terms discussions – at least to my understanding, all bets are off as soon as you add dependencies and wait conditions outside of MPI, like here with the file. A note to this point is in a rational (Section 11.7, page 491 in the 2019 draft) – based on that an MPI implementation is allowed to deadlock (or cause a deadlock) – if all dependencies would be in MPI calls, then “eventual” progress should be guaranteed – even if it is after the 100 days in Rajeev’s example: that would – as far as I understand – still be correct behavior, as no MPI call is guaranteed to return in a fixed finite time (all calls are at best “weak local”).

 

Martin

 

 

 

-- 
Prof. Dr. Martin Schulz, Chair of Computer Architecture and Parallel Systems
Department of Informatics, TU-Munich, Boltzmannstraße 3, D-85748 Garching
Member of the Board of Directors at the Leibniz Supercomputing Centre (LRZ)
Email: schulzm at in.tum.de

 

 

 

From: mpi-forum <mpi-forum-bounces at lists.mpi-forum.org> on behalf of Jim Dinan via mpi-forum <mpi-forum at lists.mpi-forum.org>
Reply-To: Main MPI Forum mailing list <mpi-forum at lists.mpi-forum.org>
Date: Sunday, 11. October 2020 at 23:41
To: "Skjellum, Anthony" <Tony-Skjellum at utc.edu>
Cc: Jim Dinan <james.dinan at gmail.com>, Main MPI Forum mailing list <mpi-forum at lists.mpi-forum.org>
Subject: Re: [Mpi-forum] [EXT]: Progress Question

 

You can have a situation where the isend/irecv pair completes at process 0 before process 1 has called irecv or waitall. Since process 0 is now busy waiting on the file, it will not make progress on MPI calls and can result in deadlock. 

 

 ~Jim.

 

On Sat, Oct 10, 2020 at 2:17 PM Skjellum, Anthony <Tony-Skjellum at utc.edu> wrote:

Jim, OK, my attempt at answering below.

 

See if you agree with my annotations.

 

-Tony

 

 

Anthony Skjellum, PhD

Professor of Computer Science and Chair of Excellence

Director, SimCenter

University of Tennessee at Chattanooga (UTC)

tony-skjellum at utc.edu  [or skjellum at gmail.com]

cell: 205-807-4968

 

 

From: mpi-forum <mpi-forum-bounces at lists.mpi-forum.org> on behalf of Jim Dinan via mpi-forum <mpi-forum at lists.mpi-forum.org>
Sent: Saturday, October 10, 2020 1:31 PM
To: Main MPI Forum mailing list <mpi-forum at lists.mpi-forum.org>
Cc: Jim Dinan <james.dinan at gmail.com>
Subject: [EXT]: [Mpi-forum] Progress Question 

 

External Email
Hi All, 

 

A colleague recently asked a question that I wasn't able to answer definitively. Is the following code guaranteed to make progress?

 

MPI_Barrier();

-- everything is uncertain to within one message, if layered on pt2pt;

--- let's assume a power of 2, and recursive doubling (RD).

--- At each stage, it posts an irecv and isend to its corresponding element in RD

--- All stages must complete to get to the last stage.

--- At the last stage, it appears like your example below for N/2 independent process pairs, which appears always to complete.

Oif rank == 1

  create_file("test")

if rank == 0

   while not_exists("test")

       sleep(1);

 

That is, can rank 1 require rank 0 to make MPI calls after its return from the barrier, in order for rank 1 to complete the barrier? If the code were written as follows:

 

isend(..., other_rank, &req[0])

irecv(..., other_rank, &req[1])

waitall(2, req)

--- Assume both isends buffer on the send-side and return immediately--valid.

--- Both irecvs are posted, but unmatched as yet.  Nothing has transferred on network.

--- Waitall would mark the isends done at once, and work to complete the irecvs; in

     that process, each would have to progress the isends across the network. On this comm

     and all comms, incidentally.  

--- When waitall returns, the data has transferred to the receiver, otherwise the irecvs 

      aren't done.

if rank == 1

  create_file("test")

if rank == 0

   while not_exists("test")

       sleep(1);

 

I think it would clearly not guarantee progress since the send data can be buffered. Is the same true for barrier?

 

Cheers,

 ~Jim.

This message is not from a UTC.EDU address. Caution should be used in clicking links and downloading attachments from unknown senders or unexpected email. 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpi-forum/attachments/20201012/a64b8dae/attachment-0001.html>


More information about the mpi-forum mailing list