[Mpi3-subsetting] MPI subsetting: charting the way forwardatatelecon next week?

Richard Graham rlgraham at [hidden]
Fri Jun 20 12:26:13 CDT 2008


An mpi_int is potentially asking for trouble.  If this allow for 32
different parameters,
 this is far too limited.  If it allows for 2^32 values, I hope we can¹t
come up with
 that many restrictions.

Back to the assertion issue.  In the context of an ABI (if this becomes a
reality), it does
 not make much sense to standardize on things that are not standard, but
stick to items
 that are defined within the standard, such as ³I will not use wild card
receives² (I am not
 advocating that one at all, just using it as an example), ³I will only use
basic MPI types², ...

For things that are not standard, such as ³no-eager-throttle², it would make
sense to
 me to have a standard way for implementations to expose which ones the
support,
 where their meaning is defined, and how a user would invoke these ­ at
mpi_init, via
 mpi_int_?, or some other way.

Rich

On 6/20/08 1:14 PM, "Supalov, Alexander" <alexander.supalov_at_[hidden]>
wrote:

> Why make it difficult when an int in the mpi_init call seems to be
> sufficient?
> 
> -----Original Message-----
> From: mpi3-subsetting-bounces_at_[hidden]
> [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of Bronis
> R. de Supinski
> Sent: Friday, June 20, 2008 7:05 PM
> To: MPI 3.0 Sub-setting working group
> Subject: Re: [Mpi3-subsetting] MPI subsetting: charting the way forward
> atatelecon next week?
> 
> 
> Yes, but the best approach would be a query/subscribe
> interface, possibly with some set of standard ketwords
> that provide portability.
> 
> On Fri, 20 Jun 2008, Supalov, Alexander wrote:
> 
>> > Hi,
>> >
>> > Ignoring an assertion should be perfectly legal.
>> >
>> > Best regards.
>> >
>> > Alexander
>> >
>> > ________________________________
>> >
>> > From: mpi3-subsetting-bounces_at_[hidden]
>> > [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of
>> > Richard Graham
>> > Sent: Friday, June 20, 2008 6:53 PM
>> > To: MPI 3.0 Sub-setting working group
>> > Subject: Re: [Mpi3-subsetting] MPI subsetting: charting the way
> forward
>> > atatelecon next week?
>> >
>> >
>> > I think we need to be careful here when it comes to assertions, and
>> > think hard about how
>> >  you want to handle these in a standard.  In some of the
> implementations
>> > I am familiar with
>> >  a no-eager-throttle key word would be useless - it is vey
>> > implementation specific.  I suppose
>> >  this is a big problem with trying to add implementation specific
>> > keywords to a standard.
>> >  It is a given that this will also cause trouble when trying to come
> up
>> > with an ABI, unless
>> >  one has a large set of defined constants, and are willing to have
> these
>> > be no-ops in
>> >  certain implementations.
>> >
>> > Rich
>> >
>> >
>> > On 6/20/08 9:56 AM, "Richard Treumann" <treumann_at_[hidden]> wrote:
>> >
>> >
>> >
>> >       Hi Alexander
>> >
>> >       Comments imbedded below.
>> >
>> >       I have no objections to someone providing a rationale for
>> > assertions related to MPI-IO and MPI_1sided.  If the rationale is
> sound
>> > I have no objection to putting them in the proposal.
>> >
>> >       I feel the proposal should be evaluated by the following
>> > algorithm.
>> >
>> >       If (this concept  is one that seems plausible) {
>> >        for each proposed assertion {
>> >        if (rationale not solid)
>> >        discard
>> >        if (deal breaker downside)
>> >        discard
>> >        }
>> >       if ((concept makes sense) & (set of worthwhile assertions is not
>> > empty))
>> >        make this part of MPI 2.2
>> >
>> >       I do not see much reason to get every assertion that eventually
>> > gains traction into MPI 2.2.  MPI 3.0 is soon enough for any that do
> not
>> > make the MPI 2.2 cut. I do not want to see the concept fall because
> some
>> > particular assertion is controversial.
>> >
>> >       I consider MPI_NO_EAGER_THROTTLE to be the single most valuable
>> > assertion for MPI 2.2 because it is needed to allow MPI to scale to
> the
>> > levels we are already seeing.
>> >
>> >
>> >       Dick Treumann  -  MPI Team/TCEM
>> >       IBM Systems & Technology Group
>> >       Dept 0lva / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
>> >       Tele (845) 433-7846         Fax (845) 433-8363
>> >
>> >
>> >       mpi3-subsetting-bounces_at_[hidden] wrote on 06/20/2008
>> > 02:58:41 AM:
>> >
>>> >       > Dear Dick,
>>> >       >
>>> >       > A couple of suggestions re your proposal:
>>> >       >
>>> >       > - If ASSERTIONS is put at the end of the MPI_INIT_ASSERTED
>> > argument
>>> >       > list, in C++ one can declare the last argument as having a
>> > zero
>>> >       > default value, and skip it if necessary. This might help with
>>> >       > deprecation of the earlier MPI_INIT_* calls.
>> >
>> >       I have no objection. It seems reasonable to let C++ default the
>> >       assertions parameter to "none"
>> >
>>> >       > - In non-Cray parts of the world, an MPI_INT followed by
>> > MPI_FLOAT
>>> >       > is likely to be a 4-byte int followed by a 4-byte float. This
>>> >       > sometimes depends on the compiler settings in effect, too.
>> >
>> >       My rationale is not specific to any particular architecture.
>> >       Some MPI datatypes are made entirely
>> >       from the same base type. Some are mixtures of types. If libmpi
>> > knows
>> >       at the moment a datatype is committed that the send side and
>> > receive
>> >       side will always use the same internal representions then it
>> > does not
>> >       need to keep track of the fact that one instance of
>> > {MPI_INT,MPI_FLOAT}
>> >       has two distinct parts. The send side can gather and ship 8
>> > bytes
>> >       and the receive side can scatter the 8 bytes. If one side might
>> > use 4
>> >       byte integers while the other side uses 8 byte integers then at
>> >       least one side will need to know there is a conversion to be
>> > done for
>> >       the MPI_INT part. If an MPI job does a spawn or join that links
>> > to a
>> >       different architecture after the datatype has been committed,
>> > and
>> >       the MPI_Type_commit has discarded the details, it is too late to
>> > get
>> >       them back.  On the other hand, if it is known there will never
>> > be a
>> >       different architecture added to the job, the extra information
>> > can be
>> >       safely discarded.
>> >
>>> >       > - I don't think MPI_NO_THREAD_CONTENTION is really necessary.
>> > The
>>> >       > original thread level settings, in particular, the use of
>> > anything
>>> >       > but MPI_THREAD_MULTIPLE, seem to capture the semantics that
>> > you proposed.
>> >
>> >       This one is kind of tricky and I also am not sure what it would
>> > mean. If
>> >       we find a clear value we can keep it and if not we can remove
>> > it.
>> >
>>> >       > - I can't fully follow the motivation for MPI_NO_ANY_SOURCE
>>> >       > deprioritization. AFAIK, a rendezvous exchange usually starts
>> > with a
>>> >       > ready-to-send packet that contains the size of the message. In
>> > this
>>> >       > case the receiving side will normally reply with a
>> > ready-to-receive
>>> >       > regardless of the buffer space available, and flag
>> > MPI_ERR_TRUNCATED
>>> >       > on message arrival if necessary. In this case, neither
>>> >       > MPI_ANY_SOURCE not MPI_NO_ANY_SOURCE seem to get into way.
>> >
>> >       My point is that MPI_NO_ANY_SOURCE might allow this round trip
>> >       protocol to be replaced by a 1/2 rendezvous protocol. If it is
>> > known
>> >       that MPI_ANY_SOURCE will not be used then the receive side can
>> > send
>> >       an "envelop and ready for data" packet to the send side. As long
>> > as
>> >       the send side knows it will receive the "envelop and ready for
>> > data"
>> >       packet when the receive is posted, it does not need to do the
>> > first 1/2
>> >       of the rendezvous. The message matching can be done at the send
>> > side.
>> >
>> >       A send for which the receive was preposted has a
>> >       good chance of finding the "envelop and ready for data" sitting
>> > in
>> >       an early queue and the large send can avoid any rendezvous
>> > delay.
>> >       Data begins to flow immediately vs waiting for a round trip of a
>> >
>> >       full rendezvous. In many cases we cut the delay in half and best
>> >
>> >       case we eliminate rendezvous delay completely. If the receive
>> > side
>> >       is late in posting the receive we still save a packet traversal
>> > but
>> >       do not save any time.
>> >
>> >       If there may be an MPI_ANY_SOURCE then this does not work
>> > because the
>> >       receive side that has an MPI_ANY_SOURCE cannot guess which
>> > sender to
>> >       notify so the sender cannot count on getting a 1/2 rendezvous
>> >       notification for a message that should match the MPI_ANY_SOURCE
>> >       receive.
>> >
>> >       The problem that made me lower the priority is that many MPIs
>> > use an
>> >       eager protocol for small messages and a rendezvous protocol for
>> > large
>> >       messages.  If the send side and receive side have the same size
>> > buffer
>> >       then both sides can reach the same conclusion: eager vs 1/2
>> > rendezvous.
>> >       If both decide on eager, the receive side will not send an
>> >       "envelop and ready for data" packet and the send side will not
>> > look
>> >       for one. If both sides decide on 1/2 rendezvous then the receive
>> > side
>> >       will send an "envelop and ready for data" packet and the send
>> > side will
>> >       look for and consume the notice.  If the send side is for an 8
>> > byte
>> >       message and the receive uses a "big enough" receive buffer of
>> > 64KB
>> >       then the two sides will probably not be able to reach the same
>> >       conclusion about the protocol. The receive side will ship off an
>> >       "envelop and ready for data" packet that the send side will not
>> >       know what to do with.
>> >
>> >
>>> >       >
>>> >       > Best regards.
>>> >       >
>>> >       > Alexander
>>> >       >
>>> >       > From: Supalov, Alexander
>>> >       > Sent: Friday, June 20, 2008 8:29 AM
>>> >       > To: 'MPI 3.0 Sub-setting working group'
>>> >       > Subject: RE: [Mpi3-subsetting] MPI subsetting: charting the
>> > way
>>> >       > forward at atelecon next week?
>> >
>>> >       > Dear Dick,
>>> >       >
>>> >       > Thank you. I remember we exchanged a couple of emails about
>> > the
>>> >       > possible extensions to the set of assertions, like one-sided
>> > and
>>> >       > I/O, and in my recollection, almost reached an agreement that
>> > this
>>> >       > can improve performance and possibly memory footprint, as well
>> > as be
>>> >       > expressed thru assertions. Do you still feel favorable about
>> > this?
>>> >       >
>>> >       > Best regards.
>>> >       >
>>> >       > Alexander
>>> >       >
>> >
>> >
>> >
>> > ________________________________
>> >
>> >       _______________________________________________
>> >       mpi3-subsetting mailing list
>> >       mpi3-subsetting_at_[hidden]
>> >       http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
>> >
>> >
>> >
>> >
>> > ---------------------------------------------------------------------
>> > Intel GmbH
>> > Dornacher Strasse 1
>> > 85622 Feldkirchen/Muenchen Germany
>> > Sitz der Gesellschaft: Feldkirchen bei Muenchen
>> > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
>> > Registergericht: Muenchen HRB 47456 Ust.-IdNr.
>> > VAT Registration No.: DE129385895
>> > Citibank Frankfurt (BLZ 502 109 00) 600119052
>> >
>> > This e-mail and any attachments may contain confidential material for
>> > the sole use of the intended recipient(s). Any review or distribution
>> > by others is strictly prohibited. If you are not the intended
>> > recipient, please contact the sender and delete all copies.
>> >
> _______________________________________________
> mpi3-subsetting mailing list
> mpi3-subsetting_at_[hidden]
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
> ---------------------------------------------------------------------
> Intel GmbH
> Dornacher Strasse 1
> 85622 Feldkirchen/Muenchen Germany
> Sitz der Gesellschaft: Feldkirchen bei Muenchen
> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> VAT Registration No.: DE129385895
> Citibank Frankfurt (BLZ 502 109 00) 600119052
> 
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
> 
> 
> _______________________________________________
> mpi3-subsetting mailing list
> mpi3-subsetting_at_[hidden]
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
> 



* 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpi3-subsetting/attachments/20080620/f5173452/attachment.html>


More information about the Mpi3-subsetting mailing list