[mpi3-coll] Neighborhood collectives round 2: reductions

Torsten Hoefler htor at illinois.edu
Tue Dec 18 14:36:46 CST 2012

On Sun, Dec 16, 2012 at 09:53:55PM -0800, Jed Brown wrote:
>    On Sat, Dec 15, 2012 at 8:08 AM, Torsten Hoefler <[1]htor at illinois.edu>
>    wrote:
>      >    Those use cases
>      ([3][2]http://lists.mpi-forum.org/mpi3-coll/2011/11/0239.php)
>      >    were all dependent on being able to reduce to overlapping
>      targets.
>      Depends on your definition of target.  If you mean processes by
>      "targets", then the current interface proposal provides this; if you
>      mean memory locations at one process by "targets", then this will not be
>      possible within current MPI semantics.
>    I mean that the memory overlaps on the processor accumulating the result
>    of the reduction. Think of a bunch of subdomains of a regular grid with
>    one or two cells of overlap. An example of a "reduction" is to add up the
>    contribution from all copies of each given cell. Cells near the middle of
>    a "face" are only shared by two processes, but corner cells are shared by
>    several processes.
Yes, that would certainly work with the current proposal (not if we want
to support MPI_IN_PLACE, but that wasn't planned anyway).

>      >    As for defining "identity", the operation I would like is to
>      reduce by
>      >    combining with a local buffer (usually in-place destination
>      buffer). That
>      >    is, if I have the local buffer
>      >    mine = [1.1, 2.1, 3.1, 4.1, 5.1, 6.1]
>      This can be expressed as a self-edge (we can discuss about in-place
>      arguments, but then you would need to guarantee that the local buffer is
>      larger than the largest neighbor buffer).
>    Useful application semantics would require the same.
Well, so it's no issue.

>      >    and vector types for my two neighbors (defined by me)
>      >    incoming_type[0] = [0, 3, 4]
>      >    incoming_type[1] = [1, 4]
>      >    with incoming data (actually getting sent)
>      >    incoming_data[0] = [10.2, 20.2, 30.2]
>      >    incoming_data[1] = [100.3, 200.3]
>      >    the result would be
>      >    [op(1.1, 10.2), op(2.1, 100.3), 3.1, op(4.1, 20.2), op(5.1, 30.2,
>      200.3),
>      >    6.1]
>      >    This would be a natural expression of the operation I call
>      "SFReduce" in
>      >    [4][3]http://59A2.org/files/StarForest.pdf
>      I see, this may be doable with the vector interface (if we remove the
>      restriction of equal vector sizes -- this would remove some optimization
>      opportunities). Can you confirm that the current proposed
>      neighbor_reducev() interface can cover this case?
>    If you remove the restriction of equal vector sizes, are you going to add
>    an MPI_Datatype describing where to put the result? (I'd expect that to be
>    a neighbor_reducew.) Note that in general, there would be some points
>    shared by neighbors {1,2} and other points shared by neighbors {1,3} (and
>    {2,3}, ...) thus we can't just sort such that the reduction is always
>    applied to the "first N" elements.
Yes, I understand. We did not plan on a *w interface yet, however, it's
straight-forward and I'd be in favor of it. The current *v interface
would only support contiguous arrangements but I'll carry the request
for th obviously useful w interface to the Forum.

>      One remaining question is if you can always guarantee "packed" data,
>      i.e., that the "empty" elements are always at the tail. Well, I guess
>      you could always add identity elements in the middle to create gaps.
>    I could pack, but I thought the point of the W interfaces was to enable
>    the user to avoid packing (with possible performance advantages relative
>    to fully-portable user code).
Yes, definitely! Unfortunately, DDTs are not always fast :-/.

>      Also, would the static communication topology work for your use-cases
>      (neighborhoods don't change for a while).
>    Yes, comm topology is typically static, and we wouldn't use the
>    neighborhood routines if it was changing frequently.


### qreharg rug ebs fv crryF ------------- http://www.unixer.de/ -----
Torsten Hoefler           | Assistant Professor
Dept. of Computer Science | ETH Zürich
Universitätsstrasse 6     | Zurich-8092, Switzerland
CAB E 64.1                | Phone: +41 76 309 79 29

More information about the mpiwg-coll mailing list