[Mpi-comments] Collective operations and synchronization
Jeremiah Willcock
jewillco at osl.iu.edu
Sun Nov 25 15:38:54 CST 2012
These questions/comments relate to the final MPI 3.0 specification at
<URL:http://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf>. All of
these comments relate to collectives on intracommunicators; collective
semantics on intercommunicators are very different, but similar issues are
likely to occur in that case as well.
It seems to be weakly specified which synchronization behavior can be
inferred from various collective operations. For example, does MPI_Reduce
not complete on the root unless it has been entered on each other process
in the communicator? Although it would be nearly impossible to do
otherwise in a general-purpose implementation, I could imagine
compiler-based optimizations in which the compiler determines that certain
processes will contribute fixed values and thus does not send messages
from those processes. Line 8 of page 40 appears to prevent this type of
optimization (removing messages completely without coalescing them into
other messages) for point-to-point communication. Also, seemingly related
collectives have different synchronization behavior stated:
1. MPI_Gather is required to have the synchronization described above by
lines 11-19 of page 150, while MPI_Reduce is not required to have it.
2. MPI_Scatter is required to wait on every non-root process until the
root enters it (line 45 of page 159-line 3 of page 160), while the
specification of MPI_Bcast does not require this. Note that line 10 of
page 218 does not seem to apply to this case, since that text appears to
be about whether MPI_Bcast waits to complete on the root until the other
processes have reached it (the converse of what MPI_Scatter requires).
3. MPI_Allgather (lines 40-45 of page 165) and MPI_Alltoall (lines 42-48
of page 168) are required to act as barriers, while MPI_Allreduce is not.
MPI_Reduce_scatter_block has non-normative text (lines 11-15 of page 191)
stating that it is "equivalent" to MPI_Reduce + MPI_Scatter, which means
that the root must reach it before other processes complete it but does
not require a full barrier unless MPI_Reduce has stronger synchronization
behavior.
Thus, seemingly similar collectives appear to have different constraints
on synchronization. The place in which this matters is in certain
algorithms for distributed computing that use an operation such as
MPI_Allreduce or its non-blocking equivalent to act as both a reduction
operation and as a full barrier, which seems "obviously" correct but does
not seem to be by the strict wording of the standard.
Am I understanding the wording correctly? Are the descriptions given
above what is desired for those collectives that do have synchronization
requirements given? Should the others be strengthened, perhaps using "as
if" versions of their "obvious" implementations?
-- Jeremiah Willcock
More information about the mpi-comments
mailing list