[mpi3-coll] Updates after first reading

Torsten Hoefler htor at cs.indiana.edu
Fri Jan 2 11:02:36 CST 2009


Hello Bronis,
thank you very much for your detailed review!

> I do have several other changes to suggest. I will put
> them in terms of the 12-31 version you just announced.
> Oops you just announced another. I'll try to catch any
> changes to pages or line numbers...
sorry for the frequent updates, I try to keep the document as up-to-date
as possible. I don't mind receiving reviews for older versions. I will
merge the editing changes appropriately.

> I have two other things that I don't report here. One is
> related to ticket #86. 
please add your comments to the ticket!

> I will open a ticket for the other.
Oh yes, that's a good question. I wanted to bring it up at the forum but
ran out of time. I will discuss the ticket in trac.

> I think it looks like more than it really is. Let me know
> if you have any questions. Thanks,
I will comment your changes below.

> On page 1, we need to include the cross references for
> the immediate operations as well as the blocking ones.
> This leads to several small changes:
correct, thanks for catching this one (it seems to be incomplete in
MPI-2.1 too (exscan and allreduce is missing - see below).

> Page 1, line 20: "(Section 5.3)" => "(Sections 5.3 and 5.12.1)"
> Page 1, line 22: "(Section 5.4)" => "(Sections 5.4 and 5.12.2)"
> Page 1, line 25: "(Section 5.5)" => "(Sections 5.5 and 5.12.3)"
> Page 1, line 29: "(Section 5.6)" => "(Sections 5.6 and 5.12.4)"
> Page 1, line 33: "(Section 5.7)" => "(Sections 5.7 and 5.12.5)"
> Page 1, line 37: "(Section 5.8)" => "(Sections 5.8 and 5.12.6)"
all fixed as proposed

> Page 1, line 43: "(Section 5.9)" => "(Sections 5.9, 5.12.7, and 5.12.8)"
I chose to add a reference to allreduce too (to be consistent)

> Page 1, line 45: "(Section 5.10)" => "(Sections 5.10 and 5.12.9)"
fixed as proposed

> Page 1, line 48: "(Section 5.11)" => "(Sections 5.11, 5.12.10, and 5.12.11)"
I also added exscan because it looks like it was forgotten in MPI-2.0

> I have a few slightly more complicated changes to suggest for
> page 3. The sentence you added on line 16 ("The collective operations
> do not have a message tag argument.") does not have anything to do
> with type matching so it is misplaced in that location. It fits
> much better with the paragraph on lines 27-30. I think it should
> appear between lines 29 and 30, which would make that paragraph:
> 
>   Collective communication calls may use the same communicators as
>   point-to-point communication; MPI guarantees that messages generated
>   on behalf of collective communication calls will not be confused with
>   messages generated by point-to-point communication. The collective
>   operations do not have a message tag argument. A more detailed
>   discussion of correct use of collective routines is found in
>   Section 5.13.
excellent! I changed it.

> I think the intervening paragraph on page 3 (lines 17-26) also
> needs a little rewording and reordering. With the deletions omitted
> for clarity, it currently reads:
> 
>   Collective routine operations can (but are not required to)
>   complete locally as soon as the caller's participation in
>   the collective communication is finished. The local completion
>   of a collective operation indicates that the caller is now
>   free to access locations in the communication buffer. It does
>   not indicate that other processes in the group have completed
>   or even started the operation (unless otherwise implied by in
>   the description of the operation). A blocking operation is
>   complete as soon as the call returns. A nonblocking (immediate)
>   call requires a separate completion operation, cf. ?? (Section
>   3.7). Thus, a collective communication operation may, or may
>   not, have the effect of synchronizing all calling processes.
>   This statement excludes, of course, the barrier operation.
> 
> The sentences about when operations complete need to appear earlier
> in the paragraph. Where they are currently interferes with the goal
> of stating that completion does not imply synchronization. I also
> suggest a little rewording here:
> 
>   Collective routine operations can (but are not required to)
>   complete locally as soon as the caller's participation in
>   the collective communication is finished. A blocking operation
>   completes locally as soon as the call returns. A nonblocking
>   (immediate) call requires a separate completion operation,
>   similarly to nonblocking point-to-point operations as discussed
>   in Section 3.7. The local completion of a collective operation
>   indicates that the caller is now free to access locations in the
>   communication buffer. It does not indicate that other processes
>   in the group have completed or even started the operation (unless
>   otherwise implied by in the description of the operation). Thus,
>   a collective communication operation may, or may not, have the
>   effect of synchronizing all calling processes. This statement
>   excludes, of course, the barrier operation.
ok, I agree. I thought about this arrangement when I applied the change
originally but decided not to split the first and second sentence (since
they both talk about "local completion". However, after you mentioned it
and I read it again, I agree to your proposal and changed it!

> I have a few changes for pages 49-50, most of which may just be
> related to how you are setting up for eventual cross-references:
> 
> Page 49, line 26: "As described in Chapter ?? (Section 3.7)," =>
> "As described in Section 3.7," (use "Section", not "Chapter")
fixed

> Page 49, line 34: "?? (Section 3)" => "(Section 3.7)" (include
> the word "Section" and the parentheses as well as the subnumber)
I added "described in" to make it a proper sentence - please check!

> Page 49, line 40: "communicators which would" => "communicators
> that would" (the phrase is required so "that" is correct)
fixed

> Page 49, line 44: "?? (Section 3.7.1)," => "(Section 3.7.1),"
> (include the parentheses and the word "Section")
added "(cf. Section ...)"

> Page 50, line 16: "?? (Section 3.7.5)," => "(Section 3.7.5),"
> (include the parentheses and the word "Section")
added "(cf. Section ...)"

> Page 50, lines 39-40: "refer to ?? (Section 3.7.4)." => "refer to Section
> 3.7.4." (include the word "Section" but not the parentheses)
fixed
 
> A general observation is that the descriptions of the operations
> does not discuss intercommunicators. I think something needs to
> be said about how the use of intracommunicators and intercommunicators
> affect the results of the operation for each one. Most of the time,
> this can be handled by adding this phrase:
> 
> ",whether on an intracommunicator or an intercommunicator"
ok, I thought we refer to both types when we don't mention this
explicitly. I understand that this proposal makes it clear (explicit)
and I add it.

> at the end of the sentences that state "data placements are identical
> after the operation completes" or "memory movement after completion is
> identical" or "equivalent to" the blocking operation. These sentences
> specifically occur on: 
> page 52, lines 7-8; 
fixed
> page 53, lines 36-37; 
fixed
> page 54, lines 32-33; 
fixed
> page 55, lines 35-36; 
fixed
> page 56, lines 33-34; 
fixed
> page 57, lines 32-33; 
fixed
> page 58, lines 26-27; 
fixed
> page 59, lines 32-33; 
fixed
> page 61, lines 46-48;
fixed
> page 62, lines 31-32; 
fixed
> page 63, lines 25-27; 
fixed
> page 64, lines 9-11; 
fixed
> page 64, lines 40-41; 
fixed
> page 65, lines 24-25.
fixed

> The statement for MPI_IALLTOALLV is terms of MPI_ALLTOALL so it is
> probably OK as it stands, although perhaps it would be better to
> reword the statement to be in terms of MPI_ALLTOALLV. In fact, I
> suggest adding the following on page 60 at the end of line 30:
>
> The data movement after an MPI_IALLTOALLALLV completes is identical
> to MPI_ALLTOALLV, whether on intracommunicator or an intercommunicator.
I agree, consistency is important, I replaced it by:

 "The data movement after an MPI_IALLTOALLV operation completes is
 identical to MPI_ALLTOALLV, whether on intracommunicator or an
 intercommunicator."

> The change for intercommunicators is more complicated for
> MPI_IBARRIER and relates to the outcome of ticket #30 so
> I think we should just make a note of the need for it here;
> maybe add something in parentheses on page 51
What about defining it in terms of blocking collectives? I replaced the
current description with:

 "On intracommunicators, the operation finishes locally after every
 process in the communicator called MPI_IBARRIER. On intracommunicators,
 the operation finishes when the MPI_BARRIER call would return."

> The sentence on page 52, lines 26-27 ("As in many of our example code
> fragments, we assume that some of the variables (such as comm in the
> above) have been assigned appropriate values.") already appears
> earlier in the chapter. There is no need to repeat it here so it
> should be deleted.
ok, I deleted it
 
> Page 63, line 26: "operation which delivers" => "operation that delivers"
fixed
> Page 64, line 10: "operation which delivers" => "operation that delivers"
fixed
> Page 64, line 41: "operation which delivers" => "operation that delivers"
fixed
> Page 65, line 25: "operation which delivers" => "operation that delivers"
fixed

I also edited the text for Alltoallw and Reduce (it also said "which").

> On page 66, line 19, you use "cyclic dependency" and on line 23
> you use "cyclic dependences". It is never clear to me which term
> is correct. I think either is in this case. However, you should
> be consistent. I have a slight preference for "cyclic dependency"
> so I suggest that you change the one on line 23 to "cyclic dependencies"
> although changing the one on line 19 to "cyclic dependence" would
> also be OK with me.
changed as you suggested

Thank you very much! The new version with your comments added is at [1],
the wiki [2] and the usual diff at [3]! I will follow all relevant trac
tickets.

Best,
  Torsten

[1]: http://www.unixer.de/sec/nbc-proposal-01-02-2009.pdf
[2]: https://svn.mpi-forum.org/trac/mpi-forum-web/attachment/wiki/NBColl/nbc-proposal-01-02-2009.pdf
[3]: http://www.unixer.de/sec/nbc-proposal-01-02-2009.diff

-- 
 bash$ :(){ :|:&};: --------------------- http://www.unixer.de/ -----
Torsten Hoefler       | Postdoctoral Researcher
Open Systems Lab      | Indiana University    
150 S. Woodlawn Ave.  | Bloomington, IN, 474045, USA
Lindley Hall Room 135 | +01 (812) 855-3608
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-coll/attachments/20090102/0d6683d1/attachment.pgp>


More information about the mpiwg-coll mailing list