[mpi3-coll] Telecon to discuss DV-collectives (Alltoalldv)

Torsten Hoefler htor at illinois.edu
Thu Oct 13 19:11:00 CDT 2011

On Thu, Oct 13, 2011 at 10:55:47AM -0700, Adam T. Moody wrote:
> A couple more things...
:-) -- more comments below!

> In Santorini, Rich brought up a couple of concerns that should be  
> considered.  For one, he suggested that a slightly more general  
> interface might be better in which you specify a base count for all  
> processes, and then provide a list for processes that are different than  
> that count.  This could subsume the interface I listed below if you set  
> the base count to be 0 and then list each non-zero item.  The nice thing  
> about the base count approach is that it nicely handles the "mostly  
> regular" case, in which nearly all procs have the same amount of data  
> but only a few have a little more or a little less.  For example, this  
> interface might look something like the following (added basecount and  
> removed displacements which need some thought here):
> MPI_Alltoalldv(
>  sendbuf, sbasecount, nsends, sendranks[], sendcounts[], sendtype, /*  
> O(sbasecount*P + k) list */
>  recvbuf, rbasecount, nrecvs, recvranks[], recvcounts[], rectype, /*  
> O(rbasecount*P + k) list */
>  comm
> );
> In the above, you would send/receive basecount data items from all  
> procs, except for a few ranks, whose counts are listed explicitly in  
> O(k) lists.  Setting sbasecount/rbasecount=0 essentially reduces this  
> interface to the one below (ignoring displacements).
Yes, that may be interesting for the use-case I have in mind (a 3d FFT
on a circular (cutoff) region), however, may be too limited for other
use cases (which I don't know).

> The other thing Rich was concerned about was whether an interface could  
> be specified in such a way to reduce communication of the distributed  
> count values.  One extension that might help here is to allow the  
> application to provide minimum and maximum count values that would apply  
> globally across all processes.  For example, with a maximum count value  
> in gatherdv, you could set up a tree expecting the maxcount from all  
> children but just receive less during each step.  On the otherhand, if  
> the min and max values are far apart, the implementation might fall back  
> to something more dynamic so it doesn't allocate a bunch of temporary  
> memory that it'll never use.  If an application can't specify minimum or  
> maximum values, it could always pass MPI_UNDEFINED for the min/max 
> values.
Yes, such tricks may simplify some implementation options. I'm not 100%
sure how well they integrate with the current MPI syntax and semantics
:-). But I guess recv is also only specifying a maximum. Keep in mind
that this discussion may bring us then back to the displs[] array.

Thanks & Best,

 bash$ :(){ :|:&};: --------------------- http://www.unixer.de/ -----
Torsten Hoefler         | Performance Modeling and Simulation Lead
Blue Waters Directorate | University of Illinois (UIUC)
1205 W Clark Street     | Urbana, IL, 61801
NCSA Building           | +01 (217) 244-7736

More information about the mpiwg-coll mailing list