[mpiwg-coll] Fwd: Question re neighbourhood collectives and graph topologies

Rolf Rabenseifner rabenseifner at hlrs.de
Tue Mar 25 06:25:51 CDT 2014


Based on David Henty's question, my analysis seems to imply
that the definition of the neighborhood collectives for
MPI_GRAPH_CREATE general graph topologies is fully broken, 
i.e., undefined.

I CC'ed the MPI-3.0 topology chapter committee and send this to the 
MPI-next collective working group.

Reason for my analysis:

MPI-3.0 p295:25-26 allows multiple edges and a non-symmetric 
adjacency matrix.

MPI-3.0 p295:26-27 tells that an adjacency edge from "process" to
"neighbor" does not imply a communication direction, neither
process=source nor process=destination. 

MPI-3.0 p306:46-p307:3 clearly defines that MPI_GRAPH_NEIGHBORS return
exactly the same adjacency information as defined in MPI_GRAPH_CREATE.

MPI-305 p307:11-34 (Example 7.5) uses a non-symmetric adjacency 
Matrix between process 2 and 3:

  The three edges between nodes 2 and 3 are: 
   1. process 2 -- neighbor 3
   2. process 3 -- neighbor 2
   3. process 3 -- neighbor 2

MPI-3.0 p315:11-13 defines:

  For a general graph topology, created with MPI_GRAPH_CREATE, 
  the order of neighbors in the send and receive buffers is 
  defined as the sequence of neighbors as returned by 
  MPI_GRAPH_NEIGHBORS. 

For process rank 2, the relevant edges returned by 
MPI_GRAPH_NEIGHBORS is one edge to neighbor 3
and therefore one send buffer and one receive buffer 
will be defined.

For process rank 3, the relevant edges returned by 
MPI_GRAPH_NEIGHBORS are two edges to neighbor 2
and therefore two send buffers and two receive buffers 
will be defined.

This does not match and will cause broken communication!


Please do not think that three buffers will be defined,
because you have to look at the total example, i.e.
- in process 2 
    -- one send buffer for communication with rank 3 and
    -- one receive buffer for communication with rank 3 
  are used, and
- in process 3 
    -- three send buffers for communication with ranks 0, 3, 3 and
    -- three receive buffers for communication with rank 0, 3, 3 
 ares used.
  
The number of buffers between the processes 2 and 3 do not match.


Proposal:
We should restrict the current definition of collective neighbor
communication on general graph topologies on those topologies
that define for each pair of processes on both processes the same
amount of edges between this pair of processes.
(This implies a symmetric adjacency matrix, but the adjacency
information, which also includes the sequence of the edges,
needs not to be symmetric.)


Proposed solution:
------------------

MPI-3.0 page 315 lines 11-14 read

   For a general graph topology, created with MPI_GRAPH_CREATE, 
   the order of neighbors in the send and receive buffers is 
   defined as the sequence of neighbors as returned by
   MPI_GRAPH_NEIGHBORS. Note that general graph topologies
   should generally be replaced by the distributed graph topologies.

but should read

   For a general graph topology, created with MPI_GRAPH_CREATE,
|  the use of neighborhood collective communication is
|  restricted to adjacency matrices with the number of edges
|  between any two processes is defined to be the same for both
|  processes (i.e., with a symmetric adjacency matrix).
|  In this case,
   the order of neighbors in the send and receive buffers is 
   defined as the sequence of neighbors as returned by
   MPI_GRAPH_NEIGHBORS. Note that general graph topologies
   should generally be replaced by the distributed graph topologies.


Best regards
Rolf  

----- Forwarded Message -----
From: "David Henty" <d.henty at epcc.ed.ac.uk>
To: "Rolf Rabenseifner" <rabenseifner at hlrs.de>
Sent: Monday, March 24, 2014 7:09:00 PM
Subject: Question re neighbourhood collectives and graph topologies

Rolf,

I see you were involved in the definition for the new scalable graph 
creation functions in MPI 3.0 so I wondered of you could answer a 
question re their use in neighbourhood collectives.

Page 295 of the standard says for MPI_Graph_create that "The definition 
of a node-neighbor edge does not imply a direction of the communication."

This made sense to me when graphs could only be used to look up the 
ranks of your neighbours: it was presumably up to the user how these 
were interpreted, e.g. if they implied a send or a receive.

However, the new neighbourhood collective operations infer the 
communications from the graph and do the sends and receives 
automatically. For the new distributed creation routines this seems 
clear, e.g. they talk explicitly about "source and destination" or 
"indegree and outdegree".

However, it isn't clear to me how this works with the old graph creation 
routine. Is it correct to assume that the explicit "neighbours" of each 
rank are the "out" ones, and the "in" ones are the links from all other 
processes who specify my rank as a neighbour?

This seems the obvious definition to me, but it seems to conflict with 
the statement that "The definition of a node-neighbor edge does not 
imply a direction of the communication.".

I see you have some pseudocode on how to translate between the different 
representations in the appendix of your 2010 paper, but I am being lazy 
and thinking it is quicker (for me if not for you) to you ask you rather 
than understand the pseudocode!

Thanks in advance,

David

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


-- 
Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307)



More information about the mpiwg-coll mailing list