[mpi3-coll] Neighborhood collectives round 2: reductions
Torsten Hoefler
htor at illinois.edu
Tue Dec 18 14:36:46 CST 2012
On Sun, Dec 16, 2012 at 09:53:55PM -0800, Jed Brown wrote:
> On Sat, Dec 15, 2012 at 8:08 AM, Torsten Hoefler <[1]htor at illinois.edu>
> wrote:
>
> > Those use cases
> ([3][2]http://lists.mpi-forum.org/mpi3-coll/2011/11/0239.php)
> > were all dependent on being able to reduce to overlapping
> targets.
> Depends on your definition of target. If you mean processes by
> "targets", then the current interface proposal provides this; if you
> mean memory locations at one process by "targets", then this will not be
> possible within current MPI semantics.
>
> I mean that the memory overlaps on the processor accumulating the result
> of the reduction. Think of a bunch of subdomains of a regular grid with
> one or two cells of overlap. An example of a "reduction" is to add up the
> contribution from all copies of each given cell. Cells near the middle of
> a "face" are only shared by two processes, but corner cells are shared by
> several processes.
Yes, that would certainly work with the current proposal (not if we want
to support MPI_IN_PLACE, but that wasn't planned anyway).
> > As for defining "identity", the operation I would like is to
> reduce by
> > combining with a local buffer (usually in-place destination
> buffer). That
> > is, if I have the local buffer
> > mine = [1.1, 2.1, 3.1, 4.1, 5.1, 6.1]
> This can be expressed as a self-edge (we can discuss about in-place
> arguments, but then you would need to guarantee that the local buffer is
> larger than the largest neighbor buffer).
>
> Useful application semantics would require the same.
Well, so it's no issue.
> > and vector types for my two neighbors (defined by me)
> > incoming_type[0] = [0, 3, 4]
> > incoming_type[1] = [1, 4]
> > with incoming data (actually getting sent)
> > incoming_data[0] = [10.2, 20.2, 30.2]
> > incoming_data[1] = [100.3, 200.3]
> > the result would be
> > [op(1.1, 10.2), op(2.1, 100.3), 3.1, op(4.1, 20.2), op(5.1, 30.2,
> 200.3),
> > 6.1]
> > This would be a natural expression of the operation I call
> "SFReduce" in
> > [4][3]http://59A2.org/files/StarForest.pdf
> I see, this may be doable with the vector interface (if we remove the
> restriction of equal vector sizes -- this would remove some optimization
> opportunities). Can you confirm that the current proposed
> neighbor_reducev() interface can cover this case?
>
> If you remove the restriction of equal vector sizes, are you going to add
> an MPI_Datatype describing where to put the result? (I'd expect that to be
> a neighbor_reducew.) Note that in general, there would be some points
> shared by neighbors {1,2} and other points shared by neighbors {1,3} (and
> {2,3}, ...) thus we can't just sort such that the reduction is always
> applied to the "first N" elements.
Yes, I understand. We did not plan on a *w interface yet, however, it's
straight-forward and I'd be in favor of it. The current *v interface
would only support contiguous arrangements but I'll carry the request
for th obviously useful w interface to the Forum.
> One remaining question is if you can always guarantee "packed" data,
> i.e., that the "empty" elements are always at the tail. Well, I guess
> you could always add identity elements in the middle to create gaps.
>
> I could pack, but I thought the point of the W interfaces was to enable
> the user to avoid packing (with possible performance advantages relative
> to fully-portable user code).
Yes, definitely! Unfortunately, DDTs are not always fast :-/.
> Also, would the static communication topology work for your use-cases
> (neighborhoods don't change for a while).
>
> Yes, comm topology is typically static, and we wouldn't use the
> neighborhood routines if it was changing frequently.
Excellent!
Thanks,
Torsten
--
### qreharg rug ebs fv crryF ------------- http://www.unixer.de/ -----
Torsten Hoefler | Assistant Professor
Dept. of Computer Science | ETH Zürich
Universitätsstrasse 6 | Zurich-8092, Switzerland
CAB E 64.1 | Phone: +41 76 309 79 29
More information about the mpiwg-coll
mailing list