[mpiwg-coll] Question re neighbourhood collectives and graph topologies

Mon Mar 31 14:10:13 CDT 2014

Dear all,

after Torsten's positive review, I stored all (solution + review) 
in the new ticket #419:

https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/419

If technical or English corrections are needed,
or if all is okay, I would be grateful if
you could write your review as soon as possible to the ticket.

Who is at the next meeting and could read the ticket at the meeting?
As one-reading-one-vote MPI-3.0 errata ticket, 
it should still fit to the MPI-3.1 schedule.

Best regards
Rolf

----- Original Message -----
> From: "Torsten Hoefler" <htor at illinois.edu>
> To: "Rolf Rabenseifner" <rabenseifner at hlrs.de>
> Cc: "MPI-3 Collective Subgroup Discussions" <mpiwg-coll at lists.mpi-forum.org>, "David Henty" <d.henty at epcc.ed.ac.uk>,
> "Martin Schulz" <schulzm at llnl.gov>, "Purushotham V. Bangalore" <puri at uab.edu>, "Shinji Sumimoto"
> <s-sumi at labs.fujitsu.com>, "Dave Goodell" <dgoodell at cisco.com>, "Adam Moody" <moody20 at llnl.gov>
> Sent: Sunday, March 30, 2014 10:10:17 PM
> Subject: Re: Fwd: Question re neighbourhood collectives and graph topologies
> 
> Hi Rolf,
> 
> On Tue, Mar 25, 2014 at 12:25:51PM +0100, Rolf Rabenseifner wrote:
> > Based on David Henty's question, my analysis seems to imply
> > that the definition of the neighborhood collectives for
> > MPI_GRAPH_CREATE general graph topologies is fully broken,
> > i.e., undefined.
> > 
> > I CC'ed the MPI-3.0 topology chapter committee and send this to the
> > MPI-next collective working group.
> > 
> > Reason for my analysis:
> > 
> > MPI-3.0 p295:25-26 allows multiple edges and a non-symmetric
> > adjacency matrix.
> > 
> > MPI-3.0 p295:26-27 tells that an adjacency edge from "process" to
> > "neighbor" does not imply a communication direction, neither
> > process=source nor process=destination.
> Yes, this is from the original MPI-1 text and is probably confusing.
> 
> > MPI-3.0 p306:46-p307:3 clearly defines that MPI_GRAPH_NEIGHBORS
> > return
> > exactly the same adjacency information as defined in
> > MPI_GRAPH_CREATE.
> > 
> > MPI-305 p307:11-34 (Example 7.5) uses a non-symmetric adjacency
> > Matrix between process 2 and 3:
> > 
> >   The three edges between nodes 2 and 3 are:
> >    1. process 2 -- neighbor 3
> >    2. process 3 -- neighbor 2
> >    3. process 3 -- neighbor 2
> This is truly horrible!
> 
> > MPI-3.0 p315:11-13 defines:
> > 
> >   For a general graph topology, created with MPI_GRAPH_CREATE,
> >   the order of neighbors in the send and receive buffers is
> >   defined as the sequence of neighbors as returned by
> >   MPI_GRAPH_NEIGHBORS.
> > 
> > For process rank 2, the relevant edges returned by
> > MPI_GRAPH_NEIGHBORS is one edge to neighbor 3
> > and therefore one send buffer and one receive buffer
> > will be defined.
> > 
> > For process rank 3, the relevant edges returned by
> > MPI_GRAPH_NEIGHBORS are two edges to neighbor 2
> > and therefore two send buffers and two receive buffers
> > will be defined.
> > 
> > This does not match and will cause broken communication!
> Yes, we did not check the non-scalable MPI-1 interface for additional
> brokenness. We wanted to deprecate it but didn't for various reasons.
> Instead, we put strong wording advising to never use it.
> 
> I mean: DO NOT USE MPI_GRAPH_CREATE EVER! :-)
> 
> Yet, we should fix this inconsistency with an erratum.
> 
> > Please do not think that three buffers will be defined,
> > because you have to look at the total example, i.e.
> > - in process 2
> >     -- one send buffer for communication with rank 3 and
> >     -- one receive buffer for communication with rank 3
> >   are used, and
> > - in process 3
> >     -- three send buffers for communication with ranks 0, 3, 3 and
> >     -- three receive buffers for communication with rank 0, 3, 3
> >  ares used.
> >   
> > The number of buffers between the processes 2 and 3 do not match.
> Yes
> 
> > Proposal:
> > We should restrict the current definition of collective neighbor
> > communication on general graph topologies on those topologies
> > that define for each pair of processes on both processes the same
> > amount of edges between this pair of processes.
> > (This implies a symmetric adjacency matrix, but the adjacency
> > information, which also includes the sequence of the edges,
> > needs not to be symmetric.)
> Well, this seems a bit like a hack, but since nobody should use this
> interface anyway, I have no strong feelings.
> 
> > Proposed solution:
> > ------------------
> > 
> > MPI-3.0 page 315 lines 11-14 read
> > 
> >    For a general graph topology, created with MPI_GRAPH_CREATE,
> >    the order of neighbors in the send and receive buffers is
> >    defined as the sequence of neighbors as returned by
> >    MPI_GRAPH_NEIGHBORS. Note that general graph topologies
> >    should generally be replaced by the distributed graph
> >    topologies.
> > 
> > but should read
> > 
> >    For a general graph topology, created with MPI_GRAPH_CREATE,
> > |  the use of neighborhood collective communication is
> > |  restricted to adjacency matrices with the number of edges
> > |  between any two processes is defined to be the same for both
> > |  processes (i.e., with a symmetric adjacency matrix).
> > |  In this case,
> >    the order of neighbors in the send and receive buffers is
> >    defined as the sequence of neighbors as returned by
> >    MPI_GRAPH_NEIGHBORS. Note that general graph topologies
> >    should generally be replaced by the distributed graph
> >    topologies.
> That is ok with me. Rolf, feel free to push this erratum forward,
> I'll
> support it and we can discuss it in the collectives WG.
> 
> Thanks & All the Best,
>   Torsten
> 
> --
> ### qreharg rug ebs fv crryF ----------- http://htor.inf.ethz.ch/
> -----
> Torsten Hoefler           | Assistant Professor
> Dept. of Computer Science | ETH Zürich
> Universitätsstrasse 6     | Zurich-8092, Switzerland
> CAB E 64.1                | Phone: +41 76 309 79 29
> 

-- 
Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307)