[Mpi-comments] Collective operations and synchronization

Jeremiah Willcock jewillco at osl.iu.edu
Sun Nov 25 18:57:47 CST 2012


On Sun, 25 Nov 2012, William Gropp wrote:

> You are reading more into the "as if" text than was intended by the MPI 
> Forum.  For example, the "as if" text for MPI_Scatter was meant only to 
> describe the data that was moved, not the synchronization behavior. 
> You are correct that this is not made clear in the text.  The MPI Forum 
> did not intend to constrain the algorithm choice, and in fact, we have 
> made a point of saying that users should not expect any synchronization 
> beyond what is *explicitly* described (the as if text doesn't count as 
> an explicit description, as that was, as I noted above, intended only to 
> describe what data was moved and to where, not the algorithm or details 
> of synchronization.

That was something I wasn't sure of -- whether the synchronizations that 
are required are actually intended.  I would like it if at least the All* 
and similar collectives were defined to include full barriers, since they 
would most likely (but don't need to) be implemented that way.

-- Jeremiah Willcock

>
>
> On Nov 25, 2012, at 3:38 PM, Jeremiah Willcock wrote:
>
>> These questions/comments relate to the final MPI 3.0 specification at <URL:http://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf>.  All of these comments relate to collectives on intracommunicators; collective semantics on intercommunicators are very different, but similar issues are likely to occur in that case as well.
>>
>> It seems to be weakly specified which synchronization behavior can be inferred from various collective operations.  For example, does MPI_Reduce not complete on the root unless it has been entered on each other process in the communicator?  Although it would be nearly impossible to do otherwise in a general-purpose implementation, I could imagine compiler-based optimizations in which the compiler determines that certain processes will contribute fixed values and thus does not send messages from those processes.  Line 8 of page 40 appears to prevent this type of optimization (removing messages completely without coalescing them into other messages) for point-to-point communication.  Also, seemingly related collectives have different synchronization behavior stated:
>>
>> 1. MPI_Gather is required to have the synchronization described above by lines 11-19 of page 150, while MPI_Reduce is not required to have it.
>>
>> 2. MPI_Scatter is required to wait on every non-root process until the root enters it (line 45 of page 159-line 3 of page 160), while the specification of MPI_Bcast does not require this.  Note that line 10 of page 218 does not seem to apply to this case, since that text appears to be about whether MPI_Bcast waits to complete on the root until the other processes have reached it (the converse of what MPI_Scatter requires).
>>
>> 3. MPI_Allgather (lines 40-45 of page 165) and MPI_Alltoall (lines 42-48 of page 168) are required to act as barriers, while MPI_Allreduce is not. MPI_Reduce_scatter_block has non-normative text (lines 11-15 of page 191) stating that it is "equivalent" to MPI_Reduce + MPI_Scatter, which means that the root must reach it before other processes complete it but does not require a full barrier unless MPI_Reduce has stronger synchronization behavior.
>>
>> Thus, seemingly similar collectives appear to have different constraints on synchronization.  The place in which this matters is in certain algorithms for distributed computing that use an operation such as MPI_Allreduce or its non-blocking equivalent to act as both a reduction operation and as a full barrier, which seems "obviously" correct but does not seem to be by the strict wording of the standard.
>>
>> Am I understanding the wording correctly?  Are the descriptions given above what is desired for those collectives that do have synchronization requirements given?  Should the others be strengthened, perhaps using "as if" versions of their "obvious" implementations?
>>
>> -- Jeremiah Willcock
>> _______________________________________________
>> mpi-comments mailing list
>> mpi-comments at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-comments
>
>
> _______________________________________________
> mpi-comments mailing list
> mpi-comments at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-comments
>



More information about the mpi-comments mailing list