[mpi3-coll] Updates after first reading

Bronis R. de Supinski bronis at llnl.gov
Wed Dec 31 12:46:24 CST 2008


Torsten:

Re:
> A small textual update as described in Ticket #93 (applied to
> nonblocking collectives).

As you might have guessed, I have been reading through
the NBC proposal in detail. I created tickets for things
that weren't (strictly) related to it -- thanks for catching
that the #93 also applied to a portion of it so I don't
need to report that here.

I do have several other changes to suggest. I will put
them in terms of the 12-31 version you just announced.
Oops you just announced another. I'll try to catch any
changes to pages or line numbers...

I have two other things that I don't report here. One is
related to ticket #86. I will open a ticket for the other.

I think it looks like more than it really is. Let me know
if you have any questions. Thanks,

Bronis



On page 1, we need to include the cross references for
the immediate operations as well as the blocking ones.
This leads to several small changes:

Page 1, line 20: "(Section 5.3)" => "(Sections 5.3 and 5.12.1)"

Page 1, line 22: "(Section 5.4)" => "(Sections 5.4 and 5.12.2)"

Page 1, line 25: "(Section 5.5)" => "(Sections 5.5 and 5.12.3)"

Page 1, line 29: "(Section 5.6)" => "(Sections 5.6 and 5.12.4)"

Page 1, line 33: "(Section 5.7)" => "(Sections 5.7 and 5.12.5)"

Page 1, line 37: "(Section 5.8)" => "(Sections 5.8 and 5.12.6)"

Page 1, line 43: "(Section 5.9)" => "(Sections 5.9, 5.12.7, and 5.12.8)"

Page 1, line 45: "(Section 5.10)" => "(Sections 5.10 and 5.12.9)"

Page 1, line 48: "(Section 5.11)" => "(Sections 5.11, 5.12.10, and 5.12.11)"


I have a few slightly more complicated changes to suggest for
page 3. The sentence you added on line 16 ("The collective operations
do not have a message tag argument.") does not have anything to do
with type matching so it is misplaced in that location. It fits
much better with the paragraph on lines 27-30. I think it should
appear between lines 29 and 30, which would make that paragraph:

  Collective communication calls may use the same communicators as
  point-to-point communication; MPI guarantees that messages generated
  on behalf of collective communication calls will not be confused with
  messages generated by point-to-point communication. The collective
  operations do not have a message tag argument. A more detailed
  discussion of correct use of collective routines is found in
  Section 5.13.

I think the intervening paragraph on page 3 (lines 17-26) also
needs a little rewording and reordering. With the deletions omitted
for clarity, it currently reads:

  Collective routine operations can (but are not required to)
  complete locally as soon as the caller's participation in
  the collective communication is finished. The local completion
  of a collective operation indicates that the caller is now
  free to access locations in the communication buffer. It does
  not indicate that other processes in the group have completed
  or even started the operation (unless otherwise implied by in
  the description of the operation). A blocking operation is
  complete as soon as the call returns. A nonblocking (immediate)
  call requires a separate completion operation, cf. ?? (Section
  3.7). Thus, a collective communication operation may, or may
  not, have the effect of synchronizing all calling processes.
  This statement excludes, of course, the barrier operation.

The sentences about when operations complete need to appear earlier
in the paragraph. Where they are currently interferes with the goal
of stating that completion does not imply synchronization. I also
suggest a little rewording here:

  Collective routine operations can (but are not required to)
  complete locally as soon as the caller's participation in
  the collective communication is finished. A blocking operation
  completes locally as soon as the call returns. A nonblocking
  (immediate) call requires a separate completion operation,
  similarly to nonblocking point-to-point operations as discussed
  in Section 3.7. The local completion of a collective operation
  indicates that the caller is now free to access locations in the
  communication buffer. It does not indicate that other processes
  in the group have completed or even started the operation (unless
  otherwise implied by in the description of the operation). Thus,
  a collective communication operation may, or may not, have the
  effect of synchronizing all calling processes. This statement
  excludes, of course, the barrier operation.


I have a few changes for pages 49-50, most of which may just be
related to how you are setting up for eventual cross-references:

Page 49, line 26: "As described in Chapter ?? (Section 3.7)," =>
"As described in Section 3.7," (use "Section", not "Chapter")

Page 49, line 34: "?? (Section 3)" => "(Section 3.7)" (include
the word "Section" and the parentheses as well as the subnumber)

Page 49, line 40: "communicators which would" => "communicators
that would" (the phrase is required so "that" is correct)

Page 49, line 44: "?? (Section 3.7.1)," => "(Section 3.7.1),"
(include the parentheses and the word "Section")

Page 50, line 16: "?? (Section 3.7.5)," => "(Section 3.7.5),"
(include the parentheses and the word "Section")

Page 50, lines 39-40: "refer to ?? (Section 3.7.4)." => "refer to Section
3.7.4." (include the word "Section" but not the parentheses)


A general observation is that the descriptions of the operations
does not discuss intercommunicators. I think something needs to
be said about how the use of intracommunicators and intercommunicators
affect the results of the operation for each one. Most of the time,
this can be handled by adding this phrase:

",whether on an intracommunicator or an intercommunicator"

at the end of the sentences that state "data placements are identical
after the operation completes" or "memory movement after completion is
identical" or "equivalent to" the blocking operation. These sentences
specifically occur on: page 52, lines 7-8; page 53, lines 36-37; page 54,
lines 32-33; page 55, lines 35-36; page 56, lines 33-34; page 57, lines
32-33; page 58, lines 26-27; page 59, lines 32-33; page 61, lines 46-48;
page 62, lines 31-32; page 63, lines 25-27; page 64, lines 9-11; page 64,
lines 40-41; and page 65, lines 24-25.

The statement for MPI_IALLTOALLV is terms of MPI_ALLTOALL so it is
probably OK as it stands, although perhaps it would be better to
reword the statement to be in terms of MPI_ALLTOALLV. In fact, I
suggest adding the following on page 60 at the end of line 30:

The data movement after an MPI_IALLTOALLALLV completes is identical
to MPI_ALLTOALLV, whether on intracommunicator or an intercommunicator.

The change for intercommunicators is more complicated for
MPI_IBARRIER and relates to the outcome of ticket #30 so
I think we should just make a note of the need for it here;
maybe add something in parentheses on page 51

The sentence on page 52, lines 26-27 ("As in many of our example code
fragments, we assume that some of the variables (such as comm in the
above) have been assigned appropriate values.") already appears
earlier in the chapter. There is no need to repeat it here so it
should be deleted.


Page 63, line 26: "operation which delivers" => "operation that delivers"

Page 64, line 10: "operation which delivers" => "operation that delivers"

Page 64, line 41: "operation which delivers" => "operation that delivers"

Page 65, line 25: "operation which delivers" => "operation that delivers"


On page 66, line 19, you use "cyclic dependency" and on line 23
you use "cyclic dependences". It is never clear to me which term
is correct. I think either is in this case. However, you should
be consistent. I have a slight preference for "cyclic dependency"
so I suggest that you change the one on line 23 to "cyclic dependencies"
although changing the one on line 19 to "cyclic dependence" would
also be OK with me.





More information about the mpiwg-coll mailing list