[mpi3-coll] Non-blocking Collectives Proposal Draft

Christian Siebert siebert at it.neclab.eu
Fri Oct 17 06:22:59 CDT 2008


Hi Torsten,

> oh yeah - we should not send too bi attachments to the list (your mail
> was 2.3 MiB). I fixed all remarks in the scans. See attachment at:
> https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/NBColl
I'm sorry for this, I'll give comments in pure text form in this mail. 
Putting the proposal into the Wiki is an excellent idea!

>> 2) Clarification of MPI_Request_free() for requests from non-blocking 
>> collective operations.
> what do you mean?
The current description is not clear enough and might lead to confusion. 
Is the following code correct, or not?

MPI_Ibcast(buf, c, dt, root, comm, req);
if (myrank == root) {
    MPI_Request_free(req);
}
else {
    MPI_Wait(req, MPI_STATUS_IGNORE);
}

More background: A user might assume that the root process does act 
solely as a sender (even this might be dangerous, e.g. for other 
collectives). "Freeing a request is only useful at the sender side" 
suggests that the above code is valid and can potentially even be 
beneficial for performance (no blocking at the root)...

Personally, I'd like to forbid the use of MPI_Request_free() for 
requests from nonblocking collectives.

>> 3) Better definition/description for "matching" (there is nothing like 
>> "at the same time" -> logical order?).
> yes, do you have a suggestion?
unfortunately no, sorry (i.e. nothing proper yet)
> I replaced it with simultaneously for now
> - which is suboptimal too.
I totally agree. Maybe someone else has some suggestions?

>> 4) Define "levels of progression"? To be queried (e.g., for "Synchronous 
>> Progress" MPI_Tests are needed for performance, but for "Asynchronous 
>> Progress" they would only add unnecessary overhead)? UP >= AP >= SP?
> yes, I actually erased those definitions because they don't belong in a
> standard (imho).
good -  they might be mentioned in some "Advice to..." section

>> 6) NBC gives several possible ways for optimizations. With this "General 
>> advice to implementers" we stick to only one, and might prevent others. 
>> Can we already fix a decision for optimization strategies at this stage? 
>> Should we fix it at all?
> this is only an advice. Maybe you're right, but I don't think that it'll
> be much better without the advice at all. Advices seem to be rather weak
> anyway so I have no strong opinion on that.
ok - maybe just weaken the statement, e.g. "Nonblocking operations can 
be used ..." instead of "Most nonblocking operations will be used...".
(Besides some "good" interconnects, MPI history showed that this 
"overlapping" issue, i.e. concurrent communication and computation, was 
often already problematic for pt2pt ops. I totally agree that this is 
something really good to aim for. However, I see the main advantage of 
nonblocking operations slightly more in the context of reduced waiting 
times, i.e. a blocking receiver needs to wait for the sender but a 
nonblocking receiver can potentially do something worthwhile during this 
time. An implementation which tries to optimize for this direction might 
not necessarily achieve "as low CPU overhead as possible". Currently, I 
don't know which direction - there are surely many more - is the way to 
go...)

To the point with matching BC and NBC: It is clearly stated that they 
don't match. I don't like the additional sentence "Matching them should 
fail ..." because this forces implementations to check for matchings 
(e.g. an implementation might use a separate communicator for NBC; 
explicit checking for an incorrect matching between BC and NBC would 
incur additional overhead). There are other (better) ways to detect 
incorrect MPI programs. However, this should not be part of the standard 
itself. Having an additional explanation why we decided against a 
matching might be beneficial. Something like "Both classes (i.e. BC and 
NBC) have potentially distinct optimization criteria, enforcing 
different implementations." might help the reader to understand this 
decision.

To the barrier section: "initializes" is already a predefined word in 
the MPI standard and is reserved for persistent operations. The correct 
term in this context would be "initiate". Persistent collectives is 
another proposal... ;-)

>> 7) Should there be a concrete code example in the proposal (e.g. an 
>> implementation of this double buffering example)?
> maybe, doesn't sound too bad. Let's talk about this at the Forum
Unfortunately, I'll not be there.... :-(

> I just started to creat an agenda for Chicago and put hose items on
> there (more will follow).
great! I'm also glad that so many of my remarks made it into the 
proposal. So thanks again! ;-)

Best regards,
    Christian

-- 
Christian Siebert, Dipl.-Inf.               Research Associate

            NEC Laboratories Europe, NEC Europe Ltd.
        Rathausallee 10, D-53757 Sankt Augustin, Germany

Phone: +49 (0) 2241 - 92 52 44    Fax: +49 (0) 2241 - 92 52 99

  (Registered Office: 1 Victoria Road, London W3 6BL, 2832014)



More information about the mpiwg-coll mailing list