[mpi3-coll] Reduction in neighborhood collectives, persistence?

Torsten Hoefler htor at illinois.edu
Sun Nov 27 18:59:05 CST 2011

Dear Jed,

Thanks for your interest :-). Comments inline.

>  I was looking through the collectives document from
>  [1]http://svn.mpi-forum.org/trac2/mpi-forum-web/ticket/258 There appear to
>  be some missing functions such as MPI_Neighbor_allreduce(),
>  MPI_Neighbor_allgatherw(), and others. 
One rule for including new functions into MPI-3.0 was to present a
string use-case to the MPI Forum. I am the author of the neighborhood
collectives and found use-cases for the proposed functions but failed to
find good use-cases for MPI_Neighbor_allreduce() and

All use-cases I have are in Hoefler, Lorenzen, Lumsdaine: "Sparse
Non-Blocking Collectives in Quantum Mechanical Calculations";
http://www.unixer.de/publications/index.php?pub=70 .

In particular, the Forum decided against MPI_Neighbor_allgatherw()
because there is no MPI_(All)Gatherw() in MPI-2.2. Once somebody finds
it necessary to add those functions to the dense collectives, they will
be added to the sparse collectives as well. If you have a strong
use-case where the datatypes are performance-critical, then please let
us know.

The MPI_Neighbor_allreducew() text actually still exists in the draft
proposal (commented out :-). But as mentioned, I (as a computer
scientist) was lacking application use-cases.

>  There is a comment on that ticket about dropping reduce due to lack
>  of use cases. Well, off the top of my head, MPI_Neighbor_allreducew()
>  is ideal for:
>  1. sparse matrix-vector multiplication using column-oriented distribution
>  (or transpose multiplication with row-oriented distribution, or symmetric
>  formats)
>  2. the update in symmetric additive Schwarz
>  3. basic linear algebraic operations involving partially assembled matrix
>  formats such as those found in non-overlapping domain decomposition
>  methods
>  4. basic linear algebraic operations involving nested or bordered matrix
>  formats such as show up in multiphysics applications or when solving
>  optimization problems
>  5. finite element residual evaluation using a non-overlapping element
>  partition (the most common way to implement)
>  6. finite volume or discontinuous Galerkin flux accumulation using
>  non-overlapping face partition (cell/element partition with redundant flux
>  computation is more common, but both are used)
>  If provided, we would use it immediately in PETSc for the first four
>  (through our VecScatter object which is also used in examples for
>  scenarios 5 and 6). Is this a sufficient number/importance of use cases to
>  justify neighbor reduce? (Did anyone actually try to come up with use
>  cases?)
Well, I was working with applications that couldn't benefit from those
use-cases and I don't know the details of PETSc. We need to make a
strong point for the MPI Forum to support the new functions. I could
provide a reference implementation of those functions if you would be
willing to plug them into PETSc to show that the provided semantics are

>  MPI_Neighbor_allgatherw() is much less essential, but it's still
>  convenient to avoid packing. (Packing by the caller isn't hard, but that
>  argument makes all "w" versions option. It looks inconsistent to skip just
>  this one case. Also, the performance of packing can be delicate in a NUMA
>  environment, so it would be good to do it all through a common mechanism.)
Yes, I couldn't agree more. But the Forum requires use-cases (in this
case, an example that provides higher performance by using those new

>  Of course all of these should have non-blocking variants (we never use the
>  blocking versions in PETSc).

>  I hope it's not too late to get these in.
For MPI-3.0, it may be but MPI-3.1 should come forward soon.

>  Also, what happened to persistent collectives (including these
>  neighborhood versions)? This page looks pretty
>  bare: [2]https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/PersColl
>  We would be especially interested in persistent versions of the
>  neighborhood "v" and "w" variants because it lets us build a persistent
>  handle so that the MPI implementation can determine all message sizes once
>  in setup instead of needing to rebuild it in each call (of which there may
>  be thousands or millions). In my opinion, not offering it is an implicit
>  declaration that you think there is no setup that can be amortized across
>  calls. This seems unlikely to me in the case of neighbor collectives.
I agree but all proposals regarding persistence are moved to the
persistence working group headed by Anthony Skjellum. I am not sure
about the status.

All the Best,

 bash$ :(){ :|:&};: --------------------- http://www.unixer.de/ -----
Torsten Hoefler         | Performance Modeling and Simulation Lead
Blue Waters Directorate | University of Illinois (UIUC)
1205 W Clark Street     | Urbana, IL, 61801
NCSA Building           | +01 (217) 244-7736

More information about the mpiwg-coll mailing list