[mpi3-coll] Neighborhood collectives round 2: reductions

Sat Dec 15 10:08:57 CST 2012

On Thu, Dec 13, 2012 at 11:27:16AM -0800, Jed Brown wrote:
>    On Tue, Dec 11, 2012 at 12:43 AM, Torsten Hoefler <[1]htor at illinois.edu>
>    wrote:
> 
>      On Sun, Dec 09, 2012 at 04:55:51PM -0800, Jed Brown wrote:
>      >    Understood. May I ask what is the application motivating this
>      routine?
>      For the neighbor_reduce, the use-case follows from your earlier emails
>      (you asked explicitly for this on the collectives mailinglist in
>      [2]CAM9tzSmY38bq3gmYqwB5TS5jOv8uwTU9e526tANHDTYgX4XOBg at mail.gmail.com).
>      The
>      neighbor_reducev is a simple generalization and gets closer to what you
>      were asking for. I think it's non-trivial to provide identity elements
>      in the interface while it's not too hard to just put the data into the
>      function (and hopefully not too much overhead).
> 
>    Those use cases ([3]http://lists.mpi-forum.org/mpi3-coll/2011/11/0239.php)
>    were all dependent on being able to reduce to overlapping targets.
Depends on your definition of target.  If you mean processes by
"targets", then the current interface proposal provides this; if you
mean memory locations at one process by "targets", then this will not be
possible within current MPI semantics.

>    As for defining "identity", the operation I would like is to reduce by
>    combining with a local buffer (usually in-place destination buffer). That
>    is, if I have the local buffer
>    mine = [1.1, 2.1, 3.1, 4.1, 5.1, 6.1]
This can be expressed as a self-edge (we can discuss about in-place
arguments, but then you would need to guarantee that the local buffer is
larger than the largest neighbor buffer).

>    and vector types for my two neighbors (defined by me)
>    incoming_type[0] = [0, 3, 4]
>    incoming_type[1] = [1, 4]
>    with incoming data (actually getting sent)
>    incoming_data[0] = [10.2, 20.2, 30.2]
>    incoming_data[1] = [100.3, 200.3]
>    the result would be
>    [op(1.1, 10.2), op(2.1, 100.3), 3.1, op(4.1, 20.2), op(5.1, 30.2, 200.3),
>    6.1]
>    This would be a natural expression of the operation I call "SFReduce" in
>    [4]http://59A2.org/files/StarForest.pdf
I see, this may be doable with the vector interface (if we remove the
restriction of equal vector sizes -- this would remove some optimization
opportunities). Can you confirm that the current proposed
neighbor_reducev() interface can cover this case? 

One remaining question is if you can always guarantee "packed" data,
i.e., that the "empty" elements are always at the tail. Well, I guess
you could always add identity elements in the middle to create gaps.

Also, would the static communication topology work for your use-cases
(neighborhoods don't change for a while).

>      If you now think there is no use-case for any of those (or both)
>      functions then we should discontinue the discussion of them (I have
>      absolutely no problem with this :-).
> 
>    I'm not optimistic about their utility without support for reduction into
>    different/overlapping targets.
Please define "target" (process vs. memory). 

Thanks & Best,
  Torsten

-- 
### qreharg rug ebs fv crryF ------------- http://www.unixer.de/ -----
Torsten Hoefler           | Assistant Professor
Dept. of Computer Science | ETH Zürich
Universitätsstrasse 6     | Zurich-8092, Switzerland
CAB E 64.1                | Phone: +41 76 309 79 29