<html><body>

<p>Hi Alexander<br>

<br>

Comments imbedded below.<br>

<br>

I have no objections to someone providing a rationale for assertions related to MPI-IO and MPI_1sided.  If the rationale is sound I have no objection to putting them in the proposal. <br>

<br>

I feel the proposal should be evaluated by the following algorithm.<br>

<br>

If (this concept  is one that seems plausible) {<br>

    for each proposed assertion {<br>

          if (rationale not solid) <br>

             discard<br>

          if (deal breaker downside) <br>

             discard<br>

    }<br>

if ((concept makes sense) & (set of worthwhile assertions is not empty))<br>

   make this part of MPI 2.2<br>

<br>

I do not see much reason to get every assertion that eventually gains traction into MPI 2.2.  MPI 3.0 is soon enough for any that do not make the MPI 2.2 cut. I do not want to see the concept fall because some particular assertion is controversial. <br>

<br>

I consider <font size="4">MPI_NO_EAGER_THROTTLE </font>to be the single most valuable assertion for MPI 2.2 because it is needed to allow MPI to scale to the levels we are already seeing.<br>

   <br>

<br>

Dick Treumann  -  MPI Team/TCEM            <br>

IBM Systems & Technology Group<br>

Dept 0lva / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601<br>

Tele (845) 433-7846         Fax (845) 433-8363<br>

<br>

<br>

<tt>mpi3-subsetting-bounces@lists.mpi-forum.org wrote on 06/20/2008 02:58:41 AM:<br>

<br>

> Dear Dick,</tt><br>

<tt>>  </tt><br>

<tt>> A couple of suggestions re your proposal:</tt><br>

<tt>>  </tt><br>

<tt>> - If ASSERTIONS is put at the end of the MPI_INIT_ASSERTED argument <br>

> list, in C++ one can declare the last argument as having a zero <br>

> default value, and skip it if necessary. This might help with <br>

> deprecation of the earlier MPI_INIT_* calls.</tt><br>

<br>

<tt>I have no objection. It seems reasonable to let C++ default the </tt><br>

<tt>assertions parameter to "none"</tt><br>

<br>

<tt>> - In non-Cray parts of the world, an MPI_INT followed by MPI_FLOAT <br>

> is likely to be a 4-byte int followed by a 4-byte float. This <br>

> sometimes depends on the compiler settings in effect, too.</tt><br>

<br>

<tt>My rationale is not specific to any particular architecture. </tt><br>

<tt>Some MPI datatypes are made entirely </tt><br>

<tt>from the same base type. Some are mixtures of types. If libmpi knows </tt><br>

<tt>at the moment a datatype is committed that the send side and receive</tt><br>

<tt>side will always use the same internal representions then it does not </tt><br>

<tt>need to keep track of the fact that one instance of {MPI_INT,MPI_FLOAT}</tt><br>

<tt>has two distinct parts. The send side can gather and ship 8 bytes </tt><br>

<tt>and the receive side can scatter the 8 bytes. If one side might use 4</tt><br>

<tt>byte integers while the other side uses 8 byte integers then at </tt><br>

<tt>least one side will need to know there is a conversion to be done for </tt><br>

<tt>the MPI_INT part. If an MPI job does a spawn or join that links to a</tt><br>

<tt>different architecture after the datatype has been committed, and</tt><br>

<tt>the MPI_Type_commit has discarded the details, it is too late to get </tt><br>

<tt>them back.  On the other hand, if it is known there will never be a</tt><br>

<tt>different architecture added to the job, the extra information can be</tt><br>

<tt>safely discarded.</tt><br>

<br>

<tt>> - I don't think MPI_NO_THREAD_CONTENTION is really necessary. The <br>

> original thread level settings, in particular, the use of anything <br>

> but MPI_THREAD_MULTIPLE, seem to capture the semantics that you proposed.</tt><br>

<br>

<tt>This one is kind of tricky and I also am not sure what it would mean. If</tt><br>

<tt>we find a clear value we can keep it and if not we can remove it.</tt><br>

<br>

<tt>> - I can't fully follow the motivation for MPI_NO_ANY_SOURCE <br>

> deprioritization. AFAIK, a rendezvous exchange usually starts with a<br>

> ready-to-send packet that contains the size of the message. In this <br>

> case the receiving side will normally reply with a ready-to-receive <br>

> regardless of the buffer space available, and flag MPI_ERR_TRUNCATED<br>

> on message arrival if necessary. In this case, neither <br>

> MPI_ANY_SOURCE not MPI_NO_ANY_SOURCE seem to get into way.</tt><br>

<br>

<tt>My point is that MPI_NO_ANY_SOURCE might allow this round trip </tt><br>

<tt>protocol to be replaced by a 1/2 rendezvous protocol. If it is known</tt><br>

<tt>that MPI_ANY_SOURCE will not be used then the receive side can send</tt><br>

<tt>an "envelop and ready for data" packet to the send side. As long as </tt><br>

<tt>the send side knows it will receive the "envelop and ready for data" </tt><br>

<tt>packet when the receive is posted, it does not need to do the first 1/2</tt><br>

<tt>of the rendezvous. The message matching can be done at the send side.</tt><br>

<br>

<tt>A send for which the receive was preposted has a </tt><br>

<tt>good chance of finding the "envelop and ready for data" sitting in </tt><br>

<tt>an early queue and the large send can avoid any rendezvous delay.</tt><br>

<tt>Data begins to flow immediately vs waiting for a round trip of a </tt><br>

<tt>full rendezvous. In many cases we cut the delay in half and best </tt><br>

<tt>case we eliminate rendezvous delay completely. If the receive side </tt><br>

<tt>is late in posting the receive we still save a packet traversal but</tt><br>

<tt>do not save any time.</tt><br>

<br>

<tt>If there may be an MPI_ANY_SOURCE then this does not work because the</tt><br>

<tt>receive side that has an MPI_ANY_SOURCE cannot guess which sender to </tt><br>

<tt>notify so the sender cannot count on getting a 1/2 rendezvous </tt><br>

<tt>notification for a message that should match the MPI_ANY_SOURCE </tt><br>

<tt>receive.</tt><br>

<br>

<tt>The problem that made me lower the priority is that many MPIs use an</tt><br>

<tt>eager protocol for small messages and a rendezvous protocol for large</tt><br>

<tt>messages.  If the send side and receive side have the same size buffer</tt><br>

<tt>then both sides can reach the same conclusion: eager vs 1/2 rendezvous.</tt><br>

<tt>If both decide on eager, the receive side will not send an</tt><br>

<tt>"envelop and ready for data" packet and the send side will not look </tt><br>

<tt>for one. If both sides decide on 1/2 rendezvous then the receive side</tt><br>

<tt>will send an "envelop and ready for data" packet and the send side will</tt><br>

<tt>look for and consume the notice.  If the send side is for an 8 byte </tt><br>

<tt>message and the receive uses a "big enough" receive buffer of 64KB </tt><br>

<tt>then the two sides will probably not be able to reach the same </tt><br>

<tt>conclusion about the protocol. The receive side will ship off an</tt><br>

<tt>"envelop and ready for data" packet that the send side will not </tt><br>

<tt>know what to do with.</tt><br>

<tt> </tt><br>

<br>

<tt>>  </tt><br>

<tt>> Best regards.</tt><br>

<tt>>  </tt><br>

<tt>> Alexander</tt><br>

<tt>>  </tt><br>

<tt>> From: Supalov, Alexander <br>

> Sent: Friday, June 20, 2008 8:29 AM<br>

> To: 'MPI 3.0 Sub-setting working group'<br>

> Subject: RE: [Mpi3-subsetting] MPI subsetting: charting the way <br>

> forward at atelecon next week?<br>

</tt><br>

<tt>> Dear Dick,</tt><br>

<tt>>  </tt><br>

<tt>> Thank you. I remember we exchanged a couple of emails about the <br>

> possible extensions to the set of assertions, like one-sided and <br>

> I/O, and in my recollection, almost reached an agreement that this <br>

> can improve performance and possibly memory footprint, as well as be<br>

> expressed thru assertions. Do you still feel favorable about this?</tt><br>

<tt>>  </tt><br>

<tt>> Best regards.</tt><br>

<tt>>  </tt><br>

<tt>> Alexander</tt><br>

<tt>> <br>

<br>

</tt></body></html>