[mpi3-coll] New nonblocking collective intro text

Torsten Hoefler htor at cs.indiana.edu
Sat Jan 31 13:02:39 CST 2009

here are my comments. Please note that, in the future, I will refuse to
apply all structural changes that are not absolutely necessary. This is
because we discussed the last weeks about little details such as missing
words or minor clarifications. General wording changes, however,
invalidate many of those discussions and make it hard for the reviewers
to check if earlier comments are preserved. Anyway, I think the proposed
restructuring improves the readability without changing the content. But
I ask all reviewers to read the introduction (everything between 5.12
and 5.12.1) again carefully.

The draft for revision 3 is at:
pdf:  http://www.unixer.de/sec/nbc-proposal-rev-3.pdf
diff: http://www.unixer.de/sec/nbc-proposal-rev-3.diff

I tried my best to merge as good as possible. Comments about some
missing things that I added and some corrections are below. Please send
me all comments before Monday noon EST. I will finish revision 3 on
Monday and send it for review to the whole Forum.

> As described in Section ?? (Section 3.7), performance on many systems  
> can be improved by overlapping communication and computation.
this was changed in rev. 2 and reads now as (based on Jesper's comments):
"As described in Section ?? (Section 3.7), the performance of many
applications can be improved by overlapping communication and
computation, and many systems enable this."

> Nonblocking collectives combine the potential benefits of nonblocking  
I think the word "collectives" is too colloquial and not appropriate for
the standard (is not used in MPI-2.1 anywhere). I'll leave "collective
operations" and changed all other occurrences of "collectives" (see

> point-to-point operations to exploit overlap and to avoid  
> synchronization with the optimized implementation and message scheduling  
added comma before "to" and after "synchronization"

> provided by collective operations [1,4]. One way of doing this would be  
> to perform a blocking collective operation in a separate thread. An  
> alternative mechanism that often leads to better performance (e.g.,  
> avoids context switching, scheduler overheads, and thread management) is  
> to use nonblocking collective communication [2].
all other changes are applied

> The nonblocking collective communication model is similar to the model  
> used in nonblocking point-to-point communication. A nonblocking start  
changed "in nb p2p comm" to "for nb p2p comm"

> call initiates the collective operation, but does not complete it. A  
"the" -> "a"
> separate completion call is needed to complete the operation. 
Actually, I like the existing wording much more: "A nonblocking start
call is used to initiate a collective communication, which is eventually
completed by a separate call." because it doesn't say "not complete it.
[...] completion call is needed to complete". Anyway, that might be my
Germanness (we don't like to repeat words) and I'll trust you with this. 

> Once  
> initiated, the operation may progress independently of any computation  
> or other communication at participating processes. In this manner,  
> nonblocking collectives can mitigate synchronizing effects of collective  
"collectives" -> "collective operations"
"synchronizing" -> "possible synchronizing"

> operations by running them in the "background." In addition to enabling  
> communication-computation overlap, nonblocking collectives can perform  
"collectives" -> "collective operations"

> collective operations on overlapping communicators that would lead to  
"that" -> "which"

> deadlock with blocking operations. The semantic advantages of  
I think "Their semantics" flows better than repeating "nonblocking
collective operations" again. 
> nonblocking collectives can also be useful in combination with  
> point-to-point communication.

> As in the nonblocking point-to-point case, all start calls are local and  
> return immediately irrespective of the status of other processes. The  
> start call initiates the operation which indicates that the system may  
> start to copy data out of the send buffer and into the receive buffer.  
> Once intiated, all associated send buffers should not be modified and  
> all associated receive buffers should not be accessed until the  
> collective operation completes. The start call returns a request handle,  
> which must be passed to a completion call to complete the operation.
change applied

> All completion calls (e.g., MPI_WAIT) described in Section ?? (Section  
> 3.7.3) are supported for nonblocking collective operations. Similarly to  
> the blocking case, collective operations are considered to be complete  
> when the local part of the operation is finished, i.e., the semantics of  
> the operation are guaranteed and all buffers can be safely accessed and  
> modified. Completion does not imply that other processes have completed  
> or even started the operation unless otherwise specified in or implied  
> by the description of the operation. Completion of a particular  
> nonblocking collective operation also does not imply completion of any  
> other posted nonblocking collective (or send-receive) operations,  
> whether they are posted before or after the completed operation.
change applied

> Advice to users. Some implementations may have the effect of  
> synchronizing processes during the completion of a nonblocking  
> collective. A correct, portable program cannot rely on such  
> synchronization side-effects, however, one must program so as to allow  
> them. (End of advice to users.)
I rephrased this to: "Users should be aware that implementations are
allowed, but not required (with exception of \mpifunc{MPI\_BARRIER}), to
synchronize processes during the completion of a nonblocking collective

> Upon returning from a completion call in which a nonblocking colletive  
"colletive" -> "collective operation"
> completes, the MPI_ERROR field in the associated status object is set  
> appropriately to indicate any errors. The values of the MPI_SOURCE and  
> MPI_TAG fields are undefined. It is valid to mix different request types  
> (i.e., any combination of collective requests, I/O requests, generalized  
> requests, or point-to-point requests) in functions that enable multiple  
> completions (e.g., MPI_WAITALL). It is erroneous to call  
> MPI_REQUEST_FREE or MPI_CANCEL with a request for a nonblocking  
> collective operation.
"with" -> "for"
"request for" -> "request associated with"

Added "Nonblocking collective requests are not persistent." as per
Jesper's comment.

> Rationale. Freeing an active nonblocking collective request could cause  
> similar problems as discussed for point-to-point requests (see Section  
> ?? (3.7.3)). Cancelling a request is not supported because the semantics  
> of this operation are not well-defined. (End of rationale.)
change applied

> Multiple nonblocking collective operations can be outstanding on a  
> single communicator. If the nonblocking collective causes some system  
> resource to be exhausted, then it will fail and generate an MPI  
> exception. Quality implementations of MPI should ensure that this  
> happens only in pathological cases. That is, an MPI implementation  
> should be able to support a large number of pending nonblocking  
> collective operations.
kept as is

> Unlike point-to-point operations, nonblocking collective operations do  
> not match with blocking collectives, and collective operations do not  
> have a tag argument. 
"collectives" -> "collective operations"
"collective operations" -> "they"

> All processes must call collective operations  
> (blocking and nonblocking) in the same order per communicator. In  
> particular, once a process calls a collective operation, all other  
> processes in the communicator must eventually call the same collective  
> operation, and no other collective operation in between. This is  
> consistent with the ordering rules for blocking collective operations in  
> threaded environments.

> Rationale. Matching blocking and nonblocking collectives is not allowed  
"collectives" -> "collective operations"

> because an implementation might use different communication algorithms  
> for the two cases. Blocking collectives may be optimized for minimal  
> time to completion, while nonblocking collectives may balance time to  
> completion with CPU overhead and asynchronous progression.

> The use of tags for collective operations can prevent certain hardware  
> optimizations. (End of rationale.)

> Advice to users. If program semantics require matching blocking and  
> nonblocking collectives, then a nonblocking collective operation can be  
"collectives" -> ...
> initiated and immediately completed with a blocking wait to emulate  
> blocking behavior. (End of advice to users.)

> In terms of data movements, each nonblocking collective operation has  
> the same effect as its blocking counterpart for intracommunicators and  
> intercommunicators after completion. The use of the “in place” option is  
> allowed exactly as described for the corresponding blocking collective  
> operations. Likewise, upon completion, nonblocking collective reduction  
> operations have the same effect as their blocking counterparts, and the  
> same restrictions and recommendations on reduction orders apply.

> Progression rules for nonblocking collectives are similar to progression  
"collectives" -> ...
> of nonblocking point-to-point operations, refer to Section ?? (Section  
> 3.7.4).

> Advice to implementors. Nonblocking collective operations can be  
> implemented with local execution schedules [3] using nonblocking  
> point-to-point communication and a reserved tag-space. (End of advice to  
> implementors.)


 bash$ :(){ :|:&};: --------------------- http://www.unixer.de/ -----
Torsten Hoefler       | Postdoctoral Researcher
Open Systems Lab      | Indiana University    
150 S. Woodlawn Ave.  | Bloomington, IN, 474045, USA
Lindley Hall Room 135 | +01 (812) 855-3608

More information about the mpiwg-coll mailing list