[Mpi-forum] MPI "Allocate receive" proposal

Mon Aug 26 15:53:28 CDT 2013

* Jed Brown <jedbrown at mcs.anl.gov> [2013-08-26 15:11:11]:

> "Underwood, Keith D" <keith.d.underwood at intel.com> writes:
> > But, do they know *approximately* the size of the message that is
> > expected?  Because, if they do, then most of the advantage isn't
> > there.  I struggle a little bit with the idea of well coded apps that
> > have little enough idea about the size of the message that they need
> > this.

> Two example scenarios:

> 1. After graph partitioning, we know who will receive our vertices, but
>    we don't know how many we will receive or from whom.  In incremental
>    load balancing, we might know that we only receive from our
>    neighbors, and we have a bound on the total amount of data that we'll
>    receive, but may not have enough memory to post maximal receives from
>    all neighbors.  (Only the incremental case is relevant for
>    performance because non-incremental partitioning is way expensive,
>    thus workaround 2 is fine.)

> 2. In particle simulations, the physics may provide an upper bound on
>    total data received, but we don't know in advance from whom.

> I think that in both of these cases, the user ultimately wants to
> receive into a single buffer in some way.  They might in fact have
> allocated the buffer in advance and they'd be happy if they could decide
> on a starting point and increment a counter each time a message appears.
> Neither MPI_Mprobe with ANY_SOURCE or looping over MPI_Iprobe are
> attractive compared to MPI_Waitsome, but the latter currently cannot be
> used in the scenario above.

I'm not disagreeing with the use case (in fact, I've dealt with this exact
scenario in the past and confronted the same issue).

However, think about the cost of the straightforward implementation
without MPI_Arecv.

MPI_Send (count)
MPI_Send (vector)

Other side:

MPI_Recv (count)
malloc (count)
MPI_Recv (...)

Or, even MPI_Mprobe (which didn't exist back when I was working on load
balancing).

In the overall picture, the cost of the extra recv/send/MProbe call is
going to be very minimal.

Also, certain implementations (apps) might not use this anyway, since
they'll be receiving data from multiple ranks and so a collective
(neighbourhood or regular) might be done to collect some more data first
(like sizes) and create an optimized data movement schedule or properly
allocate space in internal data structures.

  Dries
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://lists.mpi-forum.org/pipermail/mpi-forum/attachments/20130826/6efadbc0/attachment-0001.pgp>