<HTML>
<HEAD>
<TITLE>Re: [Mpi3-subsetting] MPI subsetting: charting the way forwardatatelecon next week?</TITLE>
</HEAD>
<BODY>
<FONT FACE="Verdana, Helvetica, Arial"><SPAN STYLE='font-size:12.0px'>An mpi_int is potentially asking for trouble. If this allow for 32 different parameters,<BR>
this is far too limited. If it allows for 2^32 values, I hope we can’t come up with<BR>
that many restrictions.<BR>
<BR>
Back to the assertion issue. In the context of an ABI (if this becomes a reality), it does<BR>
not make much sense to standardize on things that are not standard, but stick to items<BR>
that are defined within the standard, such as “I will not use wild card receives” (I am not<BR>
advocating that one at all, just using it as an example), “I will only use basic MPI types”, ...<BR>
<BR>
For things that are not standard, such as “no-eager-throttle”, it would make sense to<BR>
me to have a standard way for implementations to expose which ones the support,<BR>
where their meaning is defined, and how a user would invoke these – at mpi_init, via<BR>
mpi_int_?, or some other way.<BR>
<BR>
Rich<BR>
<BR>
<BR>
On 6/20/08 1:14 PM, "Supalov, Alexander" <alexander.supalov@intel.com> wrote:<BR>
<BR>
</SPAN></FONT><BLOCKQUOTE><FONT FACE="Verdana, Helvetica, Arial"><SPAN STYLE='font-size:12.0px'>Why make it difficult when an int in the mpi_init call seems to be<BR>
sufficient?<BR>
<BR>
-----Original Message-----<BR>
From: mpi3-subsetting-bounces@lists.mpi-forum.org<BR>
[<a href="mailto:mpi3-subsetting-bounces@lists.mpi-forum.org]">mailto:mpi3-subsetting-bounces@lists.mpi-forum.org]</a> On Behalf Of Bronis<BR>
R. de Supinski<BR>
Sent: Friday, June 20, 2008 7:05 PM<BR>
To: MPI 3.0 Sub-setting working group<BR>
Subject: Re: [Mpi3-subsetting] MPI subsetting: charting the way forward<BR>
atatelecon next week?<BR>
<BR>
<BR>
Yes, but the best approach would be a query/subscribe<BR>
interface, possibly with some set of standard ketwords<BR>
that provide portability.<BR>
<BR>
On Fri, 20 Jun 2008, Supalov, Alexander wrote:<BR>
<BR>
> Hi,<BR>
><BR>
> Ignoring an assertion should be perfectly legal.<BR>
><BR>
> Best regards.<BR>
><BR>
> Alexander<BR>
><BR>
> ________________________________<BR>
><BR>
> From: mpi3-subsetting-bounces@lists.mpi-forum.org<BR>
> [<a href="mailto:mpi3-subsetting-bounces@lists.mpi-forum.org]">mailto:mpi3-subsetting-bounces@lists.mpi-forum.org]</a> On Behalf Of<BR>
> Richard Graham<BR>
> Sent: Friday, June 20, 2008 6:53 PM<BR>
> To: MPI 3.0 Sub-setting working group<BR>
> Subject: Re: [Mpi3-subsetting] MPI subsetting: charting the way<BR>
forward<BR>
> atatelecon next week?<BR>
><BR>
><BR>
> I think we need to be careful here when it comes to assertions, and<BR>
> think hard about how<BR>
> you want to handle these in a standard. In some of the<BR>
implementations<BR>
> I am familiar with<BR>
> a no-eager-throttle key word would be useless - it is vey<BR>
> implementation specific. I suppose<BR>
> this is a big problem with trying to add implementation specific<BR>
> keywords to a standard.<BR>
> It is a given that this will also cause trouble when trying to come<BR>
up<BR>
> with an ABI, unless<BR>
> one has a large set of defined constants, and are willing to have<BR>
these<BR>
> be no-ops in<BR>
> certain implementations.<BR>
><BR>
> Rich<BR>
><BR>
><BR>
> On 6/20/08 9:56 AM, "Richard Treumann" <treumann@us.ibm.com> wrote:<BR>
><BR>
><BR>
><BR>
> Hi Alexander<BR>
><BR>
> Comments imbedded below.<BR>
><BR>
> I have no objections to someone providing a rationale for<BR>
> assertions related to MPI-IO and MPI_1sided. If the rationale is<BR>
sound<BR>
> I have no objection to putting them in the proposal.<BR>
><BR>
> I feel the proposal should be evaluated by the following<BR>
> algorithm.<BR>
><BR>
> If (this concept is one that seems plausible) {<BR>
> for each proposed assertion {<BR>
> if (rationale not solid)<BR>
> discard<BR>
> if (deal breaker downside)<BR>
> discard<BR>
> }<BR>
> if ((concept makes sense) & (set of worthwhile assertions is not<BR>
> empty))<BR>
> make this part of MPI 2.2<BR>
><BR>
> I do not see much reason to get every assertion that eventually<BR>
> gains traction into MPI 2.2. MPI 3.0 is soon enough for any that do<BR>
not<BR>
> make the MPI 2.2 cut. I do not want to see the concept fall because<BR>
some<BR>
> particular assertion is controversial.<BR>
><BR>
> I consider MPI_NO_EAGER_THROTTLE to be the single most valuable<BR>
> assertion for MPI 2.2 because it is needed to allow MPI to scale to<BR>
the<BR>
> levels we are already seeing.<BR>
><BR>
><BR>
> Dick Treumann - MPI Team/TCEM<BR>
> IBM Systems & Technology Group<BR>
> Dept 0lva / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601<BR>
> Tele (845) 433-7846 Fax (845) 433-8363<BR>
><BR>
><BR>
> mpi3-subsetting-bounces@lists.mpi-forum.org wrote on 06/20/2008<BR>
> 02:58:41 AM:<BR>
><BR>
> > Dear Dick,<BR>
> ><BR>
> > A couple of suggestions re your proposal:<BR>
> ><BR>
> > - If ASSERTIONS is put at the end of the MPI_INIT_ASSERTED<BR>
> argument<BR>
> > list, in C++ one can declare the last argument as having a<BR>
> zero<BR>
> > default value, and skip it if necessary. This might help with<BR>
> > deprecation of the earlier MPI_INIT_* calls.<BR>
><BR>
> I have no objection. It seems reasonable to let C++ default the<BR>
> assertions parameter to "none"<BR>
><BR>
> > - In non-Cray parts of the world, an MPI_INT followed by<BR>
> MPI_FLOAT<BR>
> > is likely to be a 4-byte int followed by a 4-byte float. This<BR>
> > sometimes depends on the compiler settings in effect, too.<BR>
><BR>
> My rationale is not specific to any particular architecture.<BR>
> Some MPI datatypes are made entirely<BR>
> from the same base type. Some are mixtures of types. If libmpi<BR>
> knows<BR>
> at the moment a datatype is committed that the send side and<BR>
> receive<BR>
> side will always use the same internal representions then it<BR>
> does not<BR>
> need to keep track of the fact that one instance of<BR>
> {MPI_INT,MPI_FLOAT}<BR>
> has two distinct parts. The send side can gather and ship 8<BR>
> bytes<BR>
> and the receive side can scatter the 8 bytes. If one side might<BR>
> use 4<BR>
> byte integers while the other side uses 8 byte integers then at<BR>
> least one side will need to know there is a conversion to be<BR>
> done for<BR>
> the MPI_INT part. If an MPI job does a spawn or join that links<BR>
> to a<BR>
> different architecture after the datatype has been committed,<BR>
> and<BR>
> the MPI_Type_commit has discarded the details, it is too late to<BR>
> get<BR>
> them back. On the other hand, if it is known there will never<BR>
> be a<BR>
> different architecture added to the job, the extra information<BR>
> can be<BR>
> safely discarded.<BR>
><BR>
> > - I don't think MPI_NO_THREAD_CONTENTION is really necessary.<BR>
> The<BR>
> > original thread level settings, in particular, the use of<BR>
> anything<BR>
> > but MPI_THREAD_MULTIPLE, seem to capture the semantics that<BR>
> you proposed.<BR>
><BR>
> This one is kind of tricky and I also am not sure what it would<BR>
> mean. If<BR>
> we find a clear value we can keep it and if not we can remove<BR>
> it.<BR>
><BR>
> > - I can't fully follow the motivation for MPI_NO_ANY_SOURCE<BR>
> > deprioritization. AFAIK, a rendezvous exchange usually starts<BR>
> with a<BR>
> > ready-to-send packet that contains the size of the message. In<BR>
> this<BR>
> > case the receiving side will normally reply with a<BR>
> ready-to-receive<BR>
> > regardless of the buffer space available, and flag<BR>
> MPI_ERR_TRUNCATED<BR>
> > on message arrival if necessary. In this case, neither<BR>
> > MPI_ANY_SOURCE not MPI_NO_ANY_SOURCE seem to get into way.<BR>
><BR>
> My point is that MPI_NO_ANY_SOURCE might allow this round trip<BR>
> protocol to be replaced by a 1/2 rendezvous protocol. If it is<BR>
> known<BR>
> that MPI_ANY_SOURCE will not be used then the receive side can<BR>
> send<BR>
> an "envelop and ready for data" packet to the send side. As long<BR>
> as<BR>
> the send side knows it will receive the "envelop and ready for<BR>
> data"<BR>
> packet when the receive is posted, it does not need to do the<BR>
> first 1/2<BR>
> of the rendezvous. The message matching can be done at the send<BR>
> side.<BR>
><BR>
> A send for which the receive was preposted has a<BR>
> good chance of finding the "envelop and ready for data" sitting<BR>
> in<BR>
> an early queue and the large send can avoid any rendezvous<BR>
> delay.<BR>
> Data begins to flow immediately vs waiting for a round trip of a<BR>
><BR>
> full rendezvous. In many cases we cut the delay in half and best<BR>
><BR>
> case we eliminate rendezvous delay completely. If the receive<BR>
> side<BR>
> is late in posting the receive we still save a packet traversal<BR>
> but<BR>
> do not save any time.<BR>
><BR>
> If there may be an MPI_ANY_SOURCE then this does not work<BR>
> because the<BR>
> receive side that has an MPI_ANY_SOURCE cannot guess which<BR>
> sender to<BR>
> notify so the sender cannot count on getting a 1/2 rendezvous<BR>
> notification for a message that should match the MPI_ANY_SOURCE<BR>
> receive.<BR>
><BR>
> The problem that made me lower the priority is that many MPIs<BR>
> use an<BR>
> eager protocol for small messages and a rendezvous protocol for<BR>
> large<BR>
> messages. If the send side and receive side have the same size<BR>
> buffer<BR>
> then both sides can reach the same conclusion: eager vs 1/2<BR>
> rendezvous.<BR>
> If both decide on eager, the receive side will not send an<BR>
> "envelop and ready for data" packet and the send side will not<BR>
> look<BR>
> for one. If both sides decide on 1/2 rendezvous then the receive<BR>
> side<BR>
> will send an "envelop and ready for data" packet and the send<BR>
> side will<BR>
> look for and consume the notice. If the send side is for an 8<BR>
> byte<BR>
> message and the receive uses a "big enough" receive buffer of<BR>
> 64KB<BR>
> then the two sides will probably not be able to reach the same<BR>
> conclusion about the protocol. The receive side will ship off an<BR>
> "envelop and ready for data" packet that the send side will not<BR>
> know what to do with.<BR>
><BR>
><BR>
> ><BR>
> > Best regards.<BR>
> ><BR>
> > Alexander<BR>
> ><BR>
> > From: Supalov, Alexander<BR>
> > Sent: Friday, June 20, 2008 8:29 AM<BR>
> > To: 'MPI 3.0 Sub-setting working group'<BR>
> > Subject: RE: [Mpi3-subsetting] MPI subsetting: charting the<BR>
> way<BR>
> > forward at atelecon next week?<BR>
><BR>
> > Dear Dick,<BR>
> ><BR>
> > Thank you. I remember we exchanged a couple of emails about<BR>
> the<BR>
> > possible extensions to the set of assertions, like one-sided<BR>
> and<BR>
> > I/O, and in my recollection, almost reached an agreement that<BR>
> this<BR>
> > can improve performance and possibly memory footprint, as well<BR>
> as be<BR>
> > expressed thru assertions. Do you still feel favorable about<BR>
> this?<BR>
> ><BR>
> > Best regards.<BR>
> ><BR>
> > Alexander<BR>
> ><BR>
><BR>
><BR>
><BR>
> ________________________________<BR>
><BR>
> _______________________________________________<BR>
> mpi3-subsetting mailing list<BR>
> mpi3-subsetting@lists.mpi-forum.org<BR>
> <a href="http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting">http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting</a><BR>
><BR>
><BR>
><BR>
><BR>
> ---------------------------------------------------------------------<BR>
> Intel GmbH<BR>
> Dornacher Strasse 1<BR>
> 85622 Feldkirchen/Muenchen Germany<BR>
> Sitz der Gesellschaft: Feldkirchen bei Muenchen<BR>
> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer<BR>
> Registergericht: Muenchen HRB 47456 Ust.-IdNr.<BR>
> VAT Registration No.: DE129385895<BR>
> Citibank Frankfurt (BLZ 502 109 00) 600119052<BR>
><BR>
> This e-mail and any attachments may contain confidential material for<BR>
> the sole use of the intended recipient(s). Any review or distribution<BR>
> by others is strictly prohibited. If you are not the intended<BR>
> recipient, please contact the sender and delete all copies.<BR>
><BR>
_______________________________________________<BR>
mpi3-subsetting mailing list<BR>
mpi3-subsetting@lists.mpi-forum.org<BR>
<a href="http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting">http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting</a><BR>
---------------------------------------------------------------------<BR>
Intel GmbH<BR>
Dornacher Strasse 1<BR>
85622 Feldkirchen/Muenchen Germany<BR>
Sitz der Gesellschaft: Feldkirchen bei Muenchen<BR>
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer<BR>
Registergericht: Muenchen HRB 47456 Ust.-IdNr.<BR>
VAT Registration No.: DE129385895<BR>
Citibank Frankfurt (BLZ 502 109 00) 600119052<BR>
<BR>
This e-mail and any attachments may contain confidential material for<BR>
the sole use of the intended recipient(s). Any review or distribution<BR>
by others is strictly prohibited. If you are not the intended<BR>
recipient, please contact the sender and delete all copies.<BR>
<BR>
<BR>
_______________________________________________<BR>
mpi3-subsetting mailing list<BR>
mpi3-subsetting@lists.mpi-forum.org<BR>
<a href="http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting">http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting</a><BR>
<BR>
</SPAN></FONT></BLOCKQUOTE><FONT FACE="Verdana, Helvetica, Arial"><SPAN STYLE='font-size:12.0px'><BR>
</SPAN></FONT>
</BODY>
</HTML>