<div>I was looking through the collectives document from <a href="http://svn.mpi-forum.org/trac2/mpi-forum-web/ticket/258">http://svn.mpi-forum.org/trac2/mpi-forum-web/ticket/258</a> There appear to be some missing functions such as MPI_Neighbor_allreduce(), MPI_Neighbor_allgatherw(), and others. There is a comment on that ticket about dropping reduce due to lack of use cases. Well, off the top of my head, MPI_Neighbor_allreducew() is ideal for:</div>

<div><br></div><div>1. sparse matrix-vector multiplication using column-oriented distribution (or transpose multiplication with row-oriented distribution, or symmetric formats)</div><div>2. the update in symmetric additive Schwarz</div>

<div>3. basic linear algebraic operations involving partially assembled matrix formats such as those found in non-overlapping domain decomposition methods</div><div>4. basic linear algebraic operations involving nested or bordered matrix formats such as show up in multiphysics applications or when solving optimization problems</div>

<div>5. finite element residual evaluation using a non-overlapping element partition (the most common way to implement)</div><div>6. finite volume or discontinuous Galerkin flux accumulation using non-overlapping face partition (cell/element partition with redundant flux computation is more common, but both are used)</div>

<div><br></div><div>If provided, we would use it immediately in PETSc for the first four (through our VecScatter object which is also used in examples for scenarios 5 and 6). Is this a sufficient number/importance of use cases to justify neighbor reduce? (Did anyone actually try to come up with use cases?)</div>

<div><br></div><div>MPI_Neighbor_allgatherw() is much less essential, but it's still convenient to avoid packing. (Packing by the caller isn't hard, but that argument makes all "w" versions option. It looks inconsistent to skip just this one case. Also, the performance of packing can be delicate in a NUMA environment, so it would be good to do it all through a common mechanism.)</div>

<div><br></div><div>Of course all of these should have non-blocking variants (we never use the blocking versions in PETSc).</div><div><br></div><div>I hope it's not too late to get these in.</div><div><br></div><div><br>

</div><div>Also, what happened to persistent collectives (including these neighborhood versions)? This page looks pretty bare: <a href="https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/PersColl">https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/PersColl</a></div>

<div><br></div><div>We would be especially interested in persistent versions of the neighborhood "v" and "w" variants because it lets us build a persistent handle so that the MPI implementation can determine all message sizes once in setup instead of needing to rebuild it in each call (of which there may be thousands or millions). In my opinion, not offering it is an implicit declaration that you think there is no setup that can be amortized across calls. This seems unlikely to me in the case of neighbor collectives.</div>