[mpi3-coll] Telecon to discuss DV-collectives (Alltoalldv)

Thu Oct 13 12:28:05 CDT 2011

Hi Torsten,
Soon after we decided to request alltoallv to be added to the dv ticket, 
I realized there is one important difference between this and the 
dynamic sparse data exchange (DSDE) case.  With alltoallv, the receiver 
knows which ranks it will recieve data from, but it doesn't with DSDE.

I think for alltoalldv, you just need each process to provide two lists: 
a send list and receive list.  Where the current API looks like this:

MPI_Alltoallv(
  sendbuf, sendcounts[], sdispls[], sendtype,  /* O(P) list */
  recvbuf, recvcounts[], rdispls[], rectype,  /* O(P) list */
  comm
);

Provide a new O(k) interface like so (have to add a count to each list 
to give its length, and a list of ranks):

MPI_Alltoalldv(
  nsends, sendbuf, sendranks[], sendcounts[], sdispls[], sendtype,  /* 
O(k) list */
  nrecvs, recvbuf, recvranks[], recvcounts[], rdispls[], rectype,  /* 
O(k) list */
  comm
);

I think the interface you're pondering would solve the tougher problem 
of DSDE.
-Adam

Torsten Hoefler wrote:

>Hello Coll-WG,
>
>At the last meeting, we decided to push the scalable (dv) collective
>proposal further towards a reading. The present forum members were
>rather clearly supporting the proposal by straw-vote.
>
>We also decided to include alltoalldv in the ticket, a call where every
>sender specifies the destinations it sends to as a list. We did not
>discuss the specification of the receive buffer though. If we force this
>to be if size P blocks (for P processes in the comm, and a block being
>count*sizeof(extent datatype)), then we're back to non-scalable again. I
>see the following alternatives:
>
>1) MPI allocates memory for the received blocks and returns a list of
>   nodes where it received from and the allocated buffer with the
>   received data
>2) the user allocates a buffer of size N (<=P) and provides it to the
>   MPI library, the library fills the buffer and returns a list of
>   source nodes. If a process received from more than N nodes, the call
>   fails (MSG truncated).
>3) the user specifies a callback function for each received block :-)
>
>I prefer 3, however, this has the same issues as active messages and
>other callbacks and will most likely be discussed to death. 2 seems thus
>most reasonable. Does anybody have another proposal?
>
>We may want to split the ticket into two parts (separating out
>alltoalldv).
>
>I think we should have a quick (~30 mins) telecon to discuss this
>matter. Please indicate your availability in the following doodle before
>Friday 10/7 if you're interested to participate in the discussion.
>
>http://www.doodle.com/4wkqnsgi8nfhfdw3
>
>The ticket is https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/264 .
>
>Thanks & Best,
>  Torsten Hoefler
>
>  
>