[mpi3-coll] Neighborhood collectives round 2: reductions

Wed Dec 19 03:42:34 CST 2012

On Tue, Dec 18, 2012 at 02:34:06PM -0700, Jed Brown wrote:
>    On Tue, Dec 18, 2012 at 1:36 PM, Torsten Hoefler <[1]htor at illinois.edu>
>    wrote:
> 
>      On Sun, Dec 16, 2012 at 09:53:55PM -0800, Jed Brown wrote:
>      >    On Sat, Dec 15, 2012 at 8:08 AM, Torsten Hoefler
>      <[1][2]htor at illinois.edu>
>      >    wrote:
>      >
>      >      >    Those use cases
>      >    
>       ([3][2][3]http://lists.mpi-forum.org/mpi3-coll/2011/11/0239.php)
>      >      >    were all dependent on being able to reduce to
>      overlapping
>      >      targets.
>      >      Depends on your definition of target.  If you mean processes
>      by
>      >      "targets", then the current interface proposal provides this;
>      if you
>      >      mean memory locations at one process by "targets", then this
>      will not be
>      >      possible within current MPI semantics.
>      >
>      >    I mean that the memory overlaps on the processor accumulating the
>      result
>      >    of the reduction. Think of a bunch of subdomains of a regular
>      grid with
>      >    one or two cells of overlap. An example of a "reduction" is to
>      add up the
>      >    contribution from all copies of each given cell. Cells near the
>      middle of
>      >    a "face" are only shared by two processes, but corner cells are
>      shared by
>      >    several processes.
>      Yes, that would certainly work with the current proposal (not if we want
>      to support MPI_IN_PLACE, but that wasn't planned anyway).
> 
>    I don't see how this works with the current proposal. I see how to send
>    different-shaped data (though the caller needs to duplicate entries to be
>    sent to more than one neighbor "corners"), 
Yes, datatype (*w) support would make this much more efficient.

>    but not how to receive/reduce
>    different-shaped data. To do that, we'd either need (a) building an
>    incoming buffer with replicated "corners" and make recvcounts[] and
>    recvdispls[], or (b) add a receiving data type for each neighbor.
I think I understand, however, to make sure, can you create the smallest
possible send/recv/local reduction example that exhibits this problem?

>    Another doc bug: sendcounts is a single integer in the C interface for
>    MPI_Neighbor_reducev.
>    Also, why are those arrays not const int[]?
Yes, both bugs. Will be fixed if we agree on the interface.

>      >    If you remove the restriction of equal vector sizes, are you
>      going to add
>      >    an MPI_Datatype describing where to put the result? (I'd expect
>      that to be
>      >    a neighbor_reducew.) Note that in general, there would be some
>      points
>      >    shared by neighbors {1,2} and other points shared by neighbors
>      {1,3} (and
>      >    {2,3}, ...) thus we can't just sort such that the reduction is
>      always
>      >    applied to the "first N" elements.
>      Yes, I understand. We did not plan on a *w interface yet, however, it's
>      straight-forward and I'd be in favor of it. The current *v interface
>      would only support contiguous arrangements but I'll carry the request
>      for th obviously useful w interface to the Forum.
> 
>    int MPI_Neighbor_reducew(void *sendbuf,int sendcounts[],int
>    senddispls[],MPI_Datatype sendtypes[],
>        void *recvbuf,int recvcounts[],int recvdispls[],MPI_Datatype
>    recvtypes[],MPI_Op op,MPI_Comm comm);
>    With the requirement that all entries in recvtypes[] are built from the
>    same basic type (at least anywhere they may overlap). The counts and
>    displs arguments are not needed for semantics (they can be absorbed into
>    the derived types, and I would expect to do this in most applications),
>    but I leave them in here for consistency with the other *w routines.
Yes, sure, the interface is straight-forward. The question is if the
Forum likes overlapping entries in datatypes to be reduced. They are
somewhat tricky to implement and also forbidden in current datatypes.

>      >      One remaining question is if you can always guarantee "packed"
>      data,
>      >      i.e., that the "empty" elements are always at the tail. Well,
>      I guess
>      >      you could always add identity elements in the middle to create
>      gaps.
>      >
>      >    I could pack, but I thought the point of the W interfaces was to
>      enable
>      >    the user to avoid packing (with possible performance advantages
>      relative
>      >    to fully-portable user code).
>      Yes, definitely! Unfortunately, DDTs are not always fast :-/.
> 
>    That's what JITs are for, right? ;-)
Sure, well, the fact is that they're slow in almost all existing
implementations. So go and prod your favorite MPI vendor :-).

Torsten

-- 
### qreharg rug ebs fv crryF ------------- http://www.unixer.de/ -----
Torsten Hoefler           | Assistant Professor
Dept. of Computer Science | ETH Zürich
Universitätsstrasse 6     | Zurich-8092, Switzerland
CAB E 64.1                | Phone: +41 76 309 79 29