[mpiwg-coll] Fwd: Question re neighbourhood collectives and graph topologies

Sun Mar 30 15:10:17 CDT 2014

Hi Rolf,

On Tue, Mar 25, 2014 at 12:25:51PM +0100, Rolf Rabenseifner wrote:
> Based on David Henty's question, my analysis seems to imply
> that the definition of the neighborhood collectives for
> MPI_GRAPH_CREATE general graph topologies is fully broken, 
> i.e., undefined.
> 
> I CC'ed the MPI-3.0 topology chapter committee and send this to the 
> MPI-next collective working group.
> 
> Reason for my analysis:
> 
> MPI-3.0 p295:25-26 allows multiple edges and a non-symmetric 
> adjacency matrix.
> 
> MPI-3.0 p295:26-27 tells that an adjacency edge from "process" to
> "neighbor" does not imply a communication direction, neither
> process=source nor process=destination. 
Yes, this is from the original MPI-1 text and is probably confusing.

> MPI-3.0 p306:46-p307:3 clearly defines that MPI_GRAPH_NEIGHBORS return
> exactly the same adjacency information as defined in MPI_GRAPH_CREATE.
> 
> MPI-305 p307:11-34 (Example 7.5) uses a non-symmetric adjacency 
> Matrix between process 2 and 3:
> 
>   The three edges between nodes 2 and 3 are: 
>    1. process 2 -- neighbor 3
>    2. process 3 -- neighbor 2
>    3. process 3 -- neighbor 2
This is truly horrible! 

> MPI-3.0 p315:11-13 defines:
> 
>   For a general graph topology, created with MPI_GRAPH_CREATE, 
>   the order of neighbors in the send and receive buffers is 
>   defined as the sequence of neighbors as returned by 
>   MPI_GRAPH_NEIGHBORS. 
> 
> For process rank 2, the relevant edges returned by 
> MPI_GRAPH_NEIGHBORS is one edge to neighbor 3
> and therefore one send buffer and one receive buffer 
> will be defined.
> 
> For process rank 3, the relevant edges returned by 
> MPI_GRAPH_NEIGHBORS are two edges to neighbor 2
> and therefore two send buffers and two receive buffers 
> will be defined.
> 
> This does not match and will cause broken communication!
Yes, we did not check the non-scalable MPI-1 interface for additional
brokenness. We wanted to deprecate it but didn't for various reasons.
Instead, we put strong wording advising to never use it. 

I mean: DO NOT USE MPI_GRAPH_CREATE EVER! :-)

Yet, we should fix this inconsistency with an erratum.

> Please do not think that three buffers will be defined,
> because you have to look at the total example, i.e.
> - in process 2 
>     -- one send buffer for communication with rank 3 and
>     -- one receive buffer for communication with rank 3 
>   are used, and
> - in process 3 
>     -- three send buffers for communication with ranks 0, 3, 3 and
>     -- three receive buffers for communication with rank 0, 3, 3 
>  ares used.
>   
> The number of buffers between the processes 2 and 3 do not match.
Yes

> Proposal:
> We should restrict the current definition of collective neighbor
> communication on general graph topologies on those topologies
> that define for each pair of processes on both processes the same
> amount of edges between this pair of processes.
> (This implies a symmetric adjacency matrix, but the adjacency
> information, which also includes the sequence of the edges,
> needs not to be symmetric.)
Well, this seems a bit like a hack, but since nobody should use this
interface anyway, I have no strong feelings.

> Proposed solution:
> ------------------
> 
> MPI-3.0 page 315 lines 11-14 read
> 
>    For a general graph topology, created with MPI_GRAPH_CREATE, 
>    the order of neighbors in the send and receive buffers is 
>    defined as the sequence of neighbors as returned by
>    MPI_GRAPH_NEIGHBORS. Note that general graph topologies
>    should generally be replaced by the distributed graph topologies.
> 
> but should read
> 
>    For a general graph topology, created with MPI_GRAPH_CREATE,
> |  the use of neighborhood collective communication is
> |  restricted to adjacency matrices with the number of edges
> |  between any two processes is defined to be the same for both
> |  processes (i.e., with a symmetric adjacency matrix).
> |  In this case,
>    the order of neighbors in the send and receive buffers is 
>    defined as the sequence of neighbors as returned by
>    MPI_GRAPH_NEIGHBORS. Note that general graph topologies
>    should generally be replaced by the distributed graph topologies.
That is ok with me. Rolf, feel free to push this erratum forward, I'll
support it and we can discuss it in the collectives WG.

Thanks & All the Best,
  Torsten

-- 
### qreharg rug ebs fv crryF ----------- http://htor.inf.ethz.ch/ -----
Torsten Hoefler           | Assistant Professor
Dept. of Computer Science | ETH Zürich
Universitätsstrasse 6     | Zurich-8092, Switzerland
CAB E 64.1                | Phone: +41 76 309 79 29