[Mpi-forum] MPI "Allocate receive" proposal

Mon Aug 26 15:39:48 CDT 2013

> Two example scenarios:
> 
> 1. After graph partitioning, we know who will receive our vertices, but
>    we don't know how many we will receive or from whom.  In incremental
>    load balancing, we might know that we only receive from our
>    neighbors, and we have a bound on the total amount of data that we'll
>    receive, but may not have enough memory to post maximal receives from
>    all neighbors.  (Only the incremental case is relevant for
>    performance because non-incremental partitioning is way expensive,
>    thus workaround 2 is fine.)
> 
> 2. In particle simulations, the physics may provide an upper bound on
>    total data received, but we don't know in advance from whom.
> 
> 
> I think that in both of these cases, the user ultimately wants to receive into a
> single buffer in some way.  They might in fact have allocated the buffer in
> advance and they'd be happy if they could decide on a starting point and
> increment a counter each time a message appears.
> Neither MPI_Mprobe with ANY_SOURCE or looping over MPI_Iprobe are
> attractive compared to MPI_Waitsome, but the latter currently cannot be
> used in the scenario above.

That's just it.  I find it extremely unlikely that the user really has a contiguous buffer that they need to receive this stuff into (particularly in these examples).  Both of these examples might actually be done *better* with pipelined transfers from the peers.  First, you send the count and then you send the data in chunks.  Because you cap the size of a chunk, you can prepost several receives (ANY_SOURCE) without major fragmentation.  Pipelining actually allows you to overlap the data transfer with the memory copy that you are inevitably doing on both ends.  In fact, these could be an example of a place where providing a different abstraction is going to cause the user to code it in a way that is ultimately worse for them (though maybe easier to code).