[Mpi3-rma] GetAccumulate restriction needed?

Sun Apr 3 21:32:17 CDT 2011

> On 04/03/2011 06:14 PM, Underwood, Keith D wrote:
> > Could you perhaps provide a use case?  Otherwise, it looks like an
> > implementation nightmare with potentially unexpected performance
> > characteristics associated with any working implementation choice.
> 
> This would be required if the application does not want to create two
> buffers, one for the input and one for the output. The implementation
> can internally use pipelining to use a smaller buffer space.

That is not a use case :-)  My understanding of fetch-and-add style operations is that you want both your increment and the old value at the target.  So, I am back to:  what is the use case?

> Also, we need this for symmetry. Two-sided calls support it.

MPI_Sendrecv requires disjoint buffers, and that is the closest thing in two-sided.

> I don't follow the implementation nightmare with this? Why is this a
> problem?

Would you think the user would expect that having the same source and target buffer would cause the implementation to:

-allocate a buffer
-perform a get into that buffer
-wait for get completion
-perform an atomic
-wait for atomic completion
-copy the data from the internal buffer to the user buffer

Now, the part that sucks is that I believe the IB implementation looks like this, but I believe that even a network that provided native fetch-and-add semantics will wind up looking like this if it provides an end-to-end reliability protocol.  Heck, even if it doesn't, you may do it rather than guesstimating the amount of network buffer between the source and destination.  That means that having the same source and destination buffer will have radically different performance than having different buffers.  So, I count that as an implementation that sucks ;-)

Keith