<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<HTML><HEAD><TITLE>Re: [Mpi3-subsetting] MPI subsetting: charting the way forward atatelecon next week?</TITLE>

<META http-equiv=Content-Type content="text/html; charset=us-ascii">

<META content="MSHTML 6.00.2900.3314" name=GENERATOR></HEAD>

<BODY>

<DIV dir=ltr align=left><SPAN class=162295716-20062008><FONT face=Arial 

color=#0000ff size=2>Hi,</FONT></SPAN></DIV>

<DIV dir=ltr align=left><SPAN class=162295716-20062008><FONT face=Arial 

color=#0000ff size=2></FONT></SPAN> </DIV>

<DIV dir=ltr align=left><SPAN class=162295716-20062008><FONT face=Arial 

color=#0000ff size=2>Ignoring an assertion should be perfectly 

legal.</FONT></SPAN></DIV>

<DIV dir=ltr align=left><SPAN class=162295716-20062008><FONT face=Arial 

color=#0000ff size=2></FONT></SPAN> </DIV>

<DIV dir=ltr align=left><SPAN class=162295716-20062008><FONT face=Arial 

color=#0000ff size=2>Best regards.</FONT></SPAN></DIV>

<DIV dir=ltr align=left><SPAN class=162295716-20062008><FONT face=Arial 

color=#0000ff size=2></FONT></SPAN> </DIV>

<DIV dir=ltr align=left><SPAN class=162295716-20062008><FONT face=Arial 

color=#0000ff size=2>Alexander</FONT></SPAN></DIV><BR>

<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>

<HR tabIndex=-1>

<FONT face=Tahoma size=2><B>From:</B> 

mpi3-subsetting-bounces@lists.mpi-forum.org 

[mailto:mpi3-subsetting-bounces@lists.mpi-forum.org] <B>On Behalf Of </B>Richard 

Graham<BR><B>Sent:</B> Friday, June 20, 2008 6:53 PM<BR><B>To:</B> MPI 3.0 

Sub-setting working group<BR><B>Subject:</B> Re: [Mpi3-subsetting] MPI 

subsetting: charting the way forward atatelecon next week?<BR></FONT><BR></DIV>

<DIV></DIV><FONT face="Verdana, Helvetica, Arial"><SPAN 

style="FONT-SIZE: 12px">I think we need to be careful here when it comes to 

assertions, and think hard about how<BR> you want to handle these in a 

standard.  In some of the implementations I am familiar with<BR> a 

no-eager-throttle key word would be useless – it is vey implementation specific. 

 I suppose<BR> this is a big problem with trying to add implementation 

specific keywords to a standard.<BR> It is a given that this will also 

cause trouble when trying to come up with an ABI, unless<BR> one has a 

large set of defined constants, and are willing to have these be no-ops 

in<BR> certain implementations.<BR><BR>Rich<BR><BR><BR>On 6/20/08 9:56 AM, 

"Richard Treumann" <treumann@us.ibm.com> wrote:<BR><BR></SPAN></FONT>

<BLOCKQUOTE><FONT face="Verdana, Helvetica, Arial"><SPAN 

  style="FONT-SIZE: 12px">Hi Alexander<BR><BR>Comments imbedded below.<BR><BR>I 

  have no objections to someone providing a rationale for assertions related to 

  MPI-IO and MPI_1sided.  If the rationale is sound I have no objection to 

  putting them in the proposal. <BR><BR>I feel the proposal should be evaluated 

  by the following algorithm.<BR><BR>If (this concept  is one that seems 

  plausible) {<BR> for each proposed assertion {<BR> if (rationale not 

  solid) <BR> discard<BR> if (deal breaker downside) 

  <BR> discard<BR> }<BR>if ((concept makes sense) & (set of 

  worthwhile assertions is not empty))<BR> make this part of MPI 

  2.2<BR><BR>I do not see much reason to get every assertion that eventually 

  gains traction into MPI 2.2.  MPI 3.0 is soon enough for any that do not 

  make the MPI 2.2 cut. I do not want to see the concept fall because some 

  particular assertion is controversial. <BR><BR>I consider </SPAN><FONT 

  size=5><SPAN style="FONT-SIZE: 18px">MPI_NO_EAGER_THROTTLE </SPAN></FONT><SPAN 

  style="FONT-SIZE: 12px">to be the single most valuable assertion for MPI 2.2 

  because it is needed to allow MPI to scale to the levels we are already 

  seeing.<BR> <BR><BR>Dick Treumann  -  MPI Team/TCEM 

             <BR>IBM 

  Systems & Technology Group<BR>Dept 0lva / MS P963 -- 2455 South Road -- 

  Poughkeepsie, NY 12601<BR>Tele (845) 433-7846 

          Fax (845) 

  433-8363<BR><BR><BR></SPAN></FONT><FONT size=2><FONT 

  face="Monaco, Courier New"><SPAN 

  style="FONT-SIZE: 10px">mpi3-subsetting-bounces@lists.mpi-forum.org wrote on 

  06/20/2008 02:58:41 AM:<BR><BR>> Dear Dick,<BR>>  <BR>> A couple 

  of suggestions re your proposal:<BR>>  <BR>> - If ASSERTIONS is put 

  at the end of the MPI_INIT_ASSERTED argument <BR>> list, in C++ one can 

  declare the last argument as having a zero <BR>> default value, and skip it 

  if necessary. This might help with <BR>> deprecation of the earlier 

  MPI_INIT_* calls.<BR></SPAN></FONT></FONT><FONT 

  face="Verdana, Helvetica, Arial"><SPAN 

  style="FONT-SIZE: 12px"><BR></SPAN></FONT><FONT size=2><FONT 

  face="Monaco, Courier New"><SPAN style="FONT-SIZE: 10px">I have no objection. 

  It seems reasonable to let C++ default the <BR>assertions parameter to 

  "none"<BR></SPAN></FONT></FONT><FONT face="Verdana, Helvetica, Arial"><SPAN 

  style="FONT-SIZE: 12px"><BR></SPAN></FONT><FONT size=2><FONT 

  face="Monaco, Courier New"><SPAN style="FONT-SIZE: 10px">> - In non-Cray 

  parts of the world, an MPI_INT followed by MPI_FLOAT <BR>> is likely to be 

  a 4-byte int followed by a 4-byte float. This <BR>> sometimes depends on 

  the compiler settings in effect, too.<BR></SPAN></FONT></FONT><FONT 

  face="Verdana, Helvetica, Arial"><SPAN 

  style="FONT-SIZE: 12px"><BR></SPAN></FONT><FONT size=2><FONT 

  face="Monaco, Courier New"><SPAN style="FONT-SIZE: 10px">My rationale is not 

  specific to any particular architecture. <BR>Some MPI datatypes are made 

  entirely <BR>from the same base type. Some are mixtures of types. If libmpi 

  knows <BR>at the moment a datatype is committed that the send side and 

  receive<BR>side will always use the same internal representions then it does 

  not <BR>need to keep track of the fact that one instance of 

  {MPI_INT,MPI_FLOAT}<BR>has two distinct parts. The send side can gather and 

  ship 8 bytes <BR>and the receive side can scatter the 8 bytes. If one side 

  might use 4<BR>byte integers while the other side uses 8 byte integers then at 

  <BR>least one side will need to know there is a conversion to be done for 

  <BR>the MPI_INT part. If an MPI job does a spawn or join that links to 

  a<BR>different architecture after the datatype has been committed, and<BR>the 

  MPI_Type_commit has discarded the details, it is too late to get <BR>them 

  back.  On the other hand, if it is known there will never be 

  a<BR>different architecture added to the job, the extra information can 

  be<BR>safely discarded.<BR></SPAN></FONT></FONT><FONT 

  face="Verdana, Helvetica, Arial"><SPAN 

  style="FONT-SIZE: 12px"><BR></SPAN></FONT><FONT size=2><FONT 

  face="Monaco, Courier New"><SPAN style="FONT-SIZE: 10px">> - I don't think 

  MPI_NO_THREAD_CONTENTION is really necessary. The <BR>> original thread 

  level settings, in particular, the use of anything <BR>> but 

  MPI_THREAD_MULTIPLE, seem to capture the semantics that you 

  proposed.<BR></SPAN></FONT></FONT><FONT face="Verdana, Helvetica, Arial"><SPAN 

  style="FONT-SIZE: 12px"><BR></SPAN></FONT><FONT size=2><FONT 

  face="Monaco, Courier New"><SPAN style="FONT-SIZE: 10px">This one is kind of 

  tricky and I also am not sure what it would mean. If<BR>we find a clear value 

  we can keep it and if not we can remove it.<BR></SPAN></FONT></FONT><FONT 

  face="Verdana, Helvetica, Arial"><SPAN 

  style="FONT-SIZE: 12px"><BR></SPAN></FONT><FONT size=2><FONT 

  face="Monaco, Courier New"><SPAN style="FONT-SIZE: 10px">> - I can't fully 

  follow the motivation for MPI_NO_ANY_SOURCE <BR>> deprioritization. AFAIK, 

  a rendezvous exchange usually starts with a<BR>> ready-to-send packet that 

  contains the size of the message. In this <BR>> case the receiving side 

  will normally reply with a ready-to-receive <BR>> regardless of the buffer 

  space available, and flag MPI_ERR_TRUNCATED<BR>> on message arrival if 

  necessary. In this case, neither <BR>> MPI_ANY_SOURCE not MPI_NO_ANY_SOURCE 

  seem to get into way.<BR></SPAN></FONT></FONT><FONT 

  face="Verdana, Helvetica, Arial"><SPAN 

  style="FONT-SIZE: 12px"><BR></SPAN></FONT><FONT size=2><FONT 

  face="Monaco, Courier New"><SPAN style="FONT-SIZE: 10px">My point is that 

  MPI_NO_ANY_SOURCE might allow this round trip <BR>protocol to be replaced by a 

  1/2 rendezvous protocol. If it is known<BR>that MPI_ANY_SOURCE will not be 

  used then the receive side can send<BR>an "envelop and ready for data" packet 

  to the send side. As long as <BR>the send side knows it will receive the 

  "envelop and ready for data" <BR>packet when the receive is posted, it does 

  not need to do the first 1/2<BR>of the rendezvous. The message matching can be 

  done at the send side.<BR></SPAN></FONT></FONT><FONT 

  face="Verdana, Helvetica, Arial"><SPAN 

  style="FONT-SIZE: 12px"><BR></SPAN></FONT><FONT size=2><FONT 

  face="Monaco, Courier New"><SPAN style="FONT-SIZE: 10px">A send for which the 

  receive was preposted has a <BR>good chance of finding the "envelop and ready 

  for data" sitting in <BR>an early queue and the large send can avoid any 

  rendezvous delay.<BR>Data begins to flow immediately vs waiting for a round 

  trip of a <BR>full rendezvous. In many cases we cut the delay in half and best 

  <BR>case we eliminate rendezvous delay completely. If the receive side <BR>is 

  late in posting the receive we still save a packet traversal but<BR>do not 

  save any time.<BR></SPAN></FONT></FONT><FONT 

  face="Verdana, Helvetica, Arial"><SPAN 

  style="FONT-SIZE: 12px"><BR></SPAN></FONT><FONT size=2><FONT 

  face="Monaco, Courier New"><SPAN style="FONT-SIZE: 10px">If there may be an 

  MPI_ANY_SOURCE then this does not work because the<BR>receive side that has an 

  MPI_ANY_SOURCE cannot guess which sender to <BR>notify so the sender cannot 

  count on getting a 1/2 rendezvous <BR>notification for a message that should 

  match the MPI_ANY_SOURCE <BR>receive.<BR></SPAN></FONT></FONT><FONT 

  face="Verdana, Helvetica, Arial"><SPAN 

  style="FONT-SIZE: 12px"><BR></SPAN></FONT><FONT size=2><FONT 

  face="Monaco, Courier New"><SPAN style="FONT-SIZE: 10px">The problem that made 

  me lower the priority is that many MPIs use an<BR>eager protocol for small 

  messages and a rendezvous protocol for large<BR>messages.  If the send 

  side and receive side have the same size buffer<BR>then both sides can reach 

  the same conclusion: eager vs 1/2 rendezvous.<BR>If both decide on eager, the 

  receive side will not send an<BR>"envelop and ready for data" packet and the 

  send side will not look <BR>for one. If both sides decide on 1/2 rendezvous 

  then the receive side<BR>will send an "envelop and ready for data" packet and 

  the send side will<BR>look for and consume the notice.  If the send side 

  is for an 8 byte <BR>message and the receive uses a "big enough" receive 

  buffer of 64KB <BR>then the two sides will probably not be able to reach the 

  same <BR>conclusion about the protocol. The receive side will ship off 

  an<BR>"envelop and ready for data" packet that the send side will not <BR>know 

  what to do with.<BR> <BR></SPAN></FONT></FONT><FONT 

  face="Verdana, Helvetica, Arial"><SPAN 

  style="FONT-SIZE: 12px"><BR></SPAN></FONT><FONT size=2><FONT 

  face="Monaco, Courier New"><SPAN style="FONT-SIZE: 10px">>  <BR>> 

  Best regards.<BR>>  <BR>> Alexander<BR>>  <BR>> From: 

  Supalov, Alexander <BR>> Sent: Friday, June 20, 2008 8:29 AM<BR>> To: 

  'MPI 3.0 Sub-setting working group'<BR>> Subject: RE: [Mpi3-subsetting] MPI 

  subsetting: charting the way <BR>> forward at atelecon next 

  week?<BR></SPAN></FONT></FONT><FONT face="Verdana, Helvetica, Arial"><SPAN 

  style="FONT-SIZE: 12px"><BR></SPAN></FONT><FONT size=2><FONT 

  face="Monaco, Courier New"><SPAN style="FONT-SIZE: 10px">> Dear 

  Dick,<BR>>  <BR>> Thank you. I remember we exchanged a couple of 

  emails about the <BR>> possible extensions to the set of assertions, like 

  one-sided and <BR>> I/O, and in my recollection, almost reached an 

  agreement that this <BR>> can improve performance and possibly memory 

  footprint, as well as be<BR>> expressed thru assertions. Do you still feel 

  favorable about this?<BR>>  <BR>> Best regards.<BR>> 

   <BR>> Alexander<BR>> <BR><BR></SPAN></FONT></FONT><FONT 

  face="Verdana, Helvetica, Arial"><SPAN style="FONT-SIZE: 12px"><BR>

  <HR align=center width="95%" SIZE=3>

  </SPAN></FONT><FONT size=2><FONT face="Monaco, Courier New"><SPAN 

  style="FONT-SIZE: 10px">_______________________________________________<BR>mpi3-subsetting 

  mailing list<BR>mpi3-subsetting@lists.mpi-forum.org<BR><A 

  href="http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting">http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting</A><BR></SPAN></FONT></FONT></BLOCKQUOTE><FONT 

size=2><FONT face="Monaco, Courier New"><SPAN 

style="FONT-SIZE: 10px"><BR></SPAN></FONT></FONT><pre>---------------------------------------------------------------------

Intel GmbH

Dornacher Strasse 1

85622 Feldkirchen/Muenchen Germany

Sitz der Gesellschaft: Feldkirchen bei Muenchen

Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer

Registergericht: Muenchen HRB 47456 Ust.-IdNr.

VAT Registration No.: DE129385895

Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for

the sole use of the intended recipient(s). Any review or distribution

by others is strictly prohibited. If you are not the intended

recipient, please contact the sender and delete all copies.

</pre></BODY></HTML>