[mpi3-coll] Telecon to discuss DV-collectives (Alltoalldv)
Torsten Hoefler
htor at illinois.edu
Thu Oct 13 19:11:00 CDT 2011
On Thu, Oct 13, 2011 at 10:55:47AM -0700, Adam T. Moody wrote:
> A couple more things...
:-) -- more comments below!
> In Santorini, Rich brought up a couple of concerns that should be
> considered. For one, he suggested that a slightly more general
> interface might be better in which you specify a base count for all
> processes, and then provide a list for processes that are different than
> that count. This could subsume the interface I listed below if you set
> the base count to be 0 and then list each non-zero item. The nice thing
> about the base count approach is that it nicely handles the "mostly
> regular" case, in which nearly all procs have the same amount of data
> but only a few have a little more or a little less. For example, this
> interface might look something like the following (added basecount and
> removed displacements which need some thought here):
>
> MPI_Alltoalldv(
> sendbuf, sbasecount, nsends, sendranks[], sendcounts[], sendtype, /*
> O(sbasecount*P + k) list */
> recvbuf, rbasecount, nrecvs, recvranks[], recvcounts[], rectype, /*
> O(rbasecount*P + k) list */
> comm
> );
>
> In the above, you would send/receive basecount data items from all
> procs, except for a few ranks, whose counts are listed explicitly in
> O(k) lists. Setting sbasecount/rbasecount=0 essentially reduces this
> interface to the one below (ignoring displacements).
Yes, that may be interesting for the use-case I have in mind (a 3d FFT
on a circular (cutoff) region), however, may be too limited for other
use cases (which I don't know).
> The other thing Rich was concerned about was whether an interface could
> be specified in such a way to reduce communication of the distributed
> count values. One extension that might help here is to allow the
> application to provide minimum and maximum count values that would apply
> globally across all processes. For example, with a maximum count value
> in gatherdv, you could set up a tree expecting the maxcount from all
> children but just receive less during each step. On the otherhand, if
> the min and max values are far apart, the implementation might fall back
> to something more dynamic so it doesn't allocate a bunch of temporary
> memory that it'll never use. If an application can't specify minimum or
> maximum values, it could always pass MPI_UNDEFINED for the min/max
> values.
Yes, such tricks may simplify some implementation options. I'm not 100%
sure how well they integrate with the current MPI syntax and semantics
:-). But I guess recv is also only specifying a maximum. Keep in mind
that this discussion may bring us then back to the displs[] array.
Thanks & Best,
Torsten
--
bash$ :(){ :|:&};: --------------------- http://www.unixer.de/ -----
Torsten Hoefler | Performance Modeling and Simulation Lead
Blue Waters Directorate | University of Illinois (UIUC)
1205 W Clark Street | Urbana, IL, 61801
NCSA Building | +01 (217) 244-7736
More information about the mpiwg-coll
mailing list