[mpi3-coll] Neighborhood collectives round 2: reductions
Torsten Hoefler
htor at illinois.edu
Wed Dec 19 03:42:34 CST 2012
On Tue, Dec 18, 2012 at 02:34:06PM -0700, Jed Brown wrote:
> On Tue, Dec 18, 2012 at 1:36 PM, Torsten Hoefler <[1]htor at illinois.edu>
> wrote:
>
> On Sun, Dec 16, 2012 at 09:53:55PM -0800, Jed Brown wrote:
> > On Sat, Dec 15, 2012 at 8:08 AM, Torsten Hoefler
> <[1][2]htor at illinois.edu>
> > wrote:
> >
> > > Those use cases
> >
> ([3][2][3]http://lists.mpi-forum.org/mpi3-coll/2011/11/0239.php)
> > > were all dependent on being able to reduce to
> overlapping
> > targets.
> > Depends on your definition of target. If you mean processes
> by
> > "targets", then the current interface proposal provides this;
> if you
> > mean memory locations at one process by "targets", then this
> will not be
> > possible within current MPI semantics.
> >
> > I mean that the memory overlaps on the processor accumulating the
> result
> > of the reduction. Think of a bunch of subdomains of a regular
> grid with
> > one or two cells of overlap. An example of a "reduction" is to
> add up the
> > contribution from all copies of each given cell. Cells near the
> middle of
> > a "face" are only shared by two processes, but corner cells are
> shared by
> > several processes.
> Yes, that would certainly work with the current proposal (not if we want
> to support MPI_IN_PLACE, but that wasn't planned anyway).
>
> I don't see how this works with the current proposal. I see how to send
> different-shaped data (though the caller needs to duplicate entries to be
> sent to more than one neighbor "corners"),
Yes, datatype (*w) support would make this much more efficient.
> but not how to receive/reduce
> different-shaped data. To do that, we'd either need (a) building an
> incoming buffer with replicated "corners" and make recvcounts[] and
> recvdispls[], or (b) add a receiving data type for each neighbor.
I think I understand, however, to make sure, can you create the smallest
possible send/recv/local reduction example that exhibits this problem?
> Another doc bug: sendcounts is a single integer in the C interface for
> MPI_Neighbor_reducev.
> Also, why are those arrays not const int[]?
Yes, both bugs. Will be fixed if we agree on the interface.
> > If you remove the restriction of equal vector sizes, are you
> going to add
> > an MPI_Datatype describing where to put the result? (I'd expect
> that to be
> > a neighbor_reducew.) Note that in general, there would be some
> points
> > shared by neighbors {1,2} and other points shared by neighbors
> {1,3} (and
> > {2,3}, ...) thus we can't just sort such that the reduction is
> always
> > applied to the "first N" elements.
> Yes, I understand. We did not plan on a *w interface yet, however, it's
> straight-forward and I'd be in favor of it. The current *v interface
> would only support contiguous arrangements but I'll carry the request
> for th obviously useful w interface to the Forum.
>
> int MPI_Neighbor_reducew(void *sendbuf,int sendcounts[],int
> senddispls[],MPI_Datatype sendtypes[],
> void *recvbuf,int recvcounts[],int recvdispls[],MPI_Datatype
> recvtypes[],MPI_Op op,MPI_Comm comm);
> With the requirement that all entries in recvtypes[] are built from the
> same basic type (at least anywhere they may overlap). The counts and
> displs arguments are not needed for semantics (they can be absorbed into
> the derived types, and I would expect to do this in most applications),
> but I leave them in here for consistency with the other *w routines.
Yes, sure, the interface is straight-forward. The question is if the
Forum likes overlapping entries in datatypes to be reduced. They are
somewhat tricky to implement and also forbidden in current datatypes.
> > One remaining question is if you can always guarantee "packed"
> data,
> > i.e., that the "empty" elements are always at the tail. Well,
> I guess
> > you could always add identity elements in the middle to create
> gaps.
> >
> > I could pack, but I thought the point of the W interfaces was to
> enable
> > the user to avoid packing (with possible performance advantages
> relative
> > to fully-portable user code).
> Yes, definitely! Unfortunately, DDTs are not always fast :-/.
>
> That's what JITs are for, right? ;-)
Sure, well, the fact is that they're slow in almost all existing
implementations. So go and prod your favorite MPI vendor :-).
Torsten
--
### qreharg rug ebs fv crryF ------------- http://www.unixer.de/ -----
Torsten Hoefler | Assistant Professor
Dept. of Computer Science | ETH Zürich
Universitätsstrasse 6 | Zurich-8092, Switzerland
CAB E 64.1 | Phone: +41 76 309 79 29
More information about the mpiwg-coll
mailing list