[Mpi3-subsetting] MPI subsetting: charting the way forward atatelecon next week?
Richard Graham
rlgraham at [hidden]
Fri Jun 20 11:52:47 CDT 2008
I think we need to be careful here when it comes to assertions, and think
hard about how
you want to handle these in a standard. In some of the implementations I
am familiar with
a no-eager-throttle key word would be useless it is vey implementation
specific. I suppose
this is a big problem with trying to add implementation specific keywords
to a standard.
It is a given that this will also cause trouble when trying to come up with
an ABI, unless
one has a large set of defined constants, and are willing to have these be
no-ops in
certain implementations.
Rich
On 6/20/08 9:56 AM, "Richard Treumann" <treumann_at_[hidden]> wrote:
> Hi Alexander
>
> Comments imbedded below.
>
> I have no objections to someone providing a rationale for assertions related
> to MPI-IO and MPI_1sided. If the rationale is sound I have no objection to
> putting them in the proposal.
>
> I feel the proposal should be evaluated by the following algorithm.
>
> If (this concept is one that seems plausible) {
> for each proposed assertion {
> if (rationale not solid)
> discard
> if (deal breaker downside)
> discard
> }
> if ((concept makes sense) & (set of worthwhile assertions is not empty))
> make this part of MPI 2.2
>
> I do not see much reason to get every assertion that eventually gains traction
> into MPI 2.2. MPI 3.0 is soon enough for any that do not make the MPI 2.2
> cut. I do not want to see the concept fall because some particular assertion
> is controversial.
>
> I consider MPI_NO_EAGER_THROTTLE to be the single most valuable assertion for
> MPI 2.2 because it is needed to allow MPI to scale to the levels we are
> already seeing.
>
>
> Dick Treumann - MPI Team/TCEM
> IBM Systems & Technology Group
> Dept 0lva / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
> Tele (845) 433-7846 Fax (845) 433-8363
>
>
> mpi3-subsetting-bounces_at_[hidden] wrote on 06/20/2008 02:58:41 AM:
>
>> > Dear Dick,
>> >
>> > A couple of suggestions re your proposal:
>> >
>> > - If ASSERTIONS is put at the end of the MPI_INIT_ASSERTED argument
>> > list, in C++ one can declare the last argument as having a zero
>> > default value, and skip it if necessary. This might help with
>> > deprecation of the earlier MPI_INIT_* calls.
>
> I have no objection. It seems reasonable to let C++ default the
> assertions parameter to "none"
>
>> > - In non-Cray parts of the world, an MPI_INT followed by MPI_FLOAT
>> > is likely to be a 4-byte int followed by a 4-byte float. This
>> > sometimes depends on the compiler settings in effect, too.
>
> My rationale is not specific to any particular architecture.
> Some MPI datatypes are made entirely
> from the same base type. Some are mixtures of types. If libmpi knows
> at the moment a datatype is committed that the send side and receive
> side will always use the same internal representions then it does not
> need to keep track of the fact that one instance of {MPI_INT,MPI_FLOAT}
> has two distinct parts. The send side can gather and ship 8 bytes
> and the receive side can scatter the 8 bytes. If one side might use 4
> byte integers while the other side uses 8 byte integers then at
> least one side will need to know there is a conversion to be done for
> the MPI_INT part. If an MPI job does a spawn or join that links to a
> different architecture after the datatype has been committed, and
> the MPI_Type_commit has discarded the details, it is too late to get
> them back. On the other hand, if it is known there will never be a
> different architecture added to the job, the extra information can be
> safely discarded.
>
>> > - I don't think MPI_NO_THREAD_CONTENTION is really necessary. The
>> > original thread level settings, in particular, the use of anything
>> > but MPI_THREAD_MULTIPLE, seem to capture the semantics that you proposed.
>
> This one is kind of tricky and I also am not sure what it would mean. If
> we find a clear value we can keep it and if not we can remove it.
>
>> > - I can't fully follow the motivation for MPI_NO_ANY_SOURCE
>> > deprioritization. AFAIK, a rendezvous exchange usually starts with a
>> > ready-to-send packet that contains the size of the message. In this
>> > case the receiving side will normally reply with a ready-to-receive
>> > regardless of the buffer space available, and flag MPI_ERR_TRUNCATED
>> > on message arrival if necessary. In this case, neither
>> > MPI_ANY_SOURCE not MPI_NO_ANY_SOURCE seem to get into way.
>
> My point is that MPI_NO_ANY_SOURCE might allow this round trip
> protocol to be replaced by a 1/2 rendezvous protocol. If it is known
> that MPI_ANY_SOURCE will not be used then the receive side can send
> an "envelop and ready for data" packet to the send side. As long as
> the send side knows it will receive the "envelop and ready for data"
> packet when the receive is posted, it does not need to do the first 1/2
> of the rendezvous. The message matching can be done at the send side.
>
> A send for which the receive was preposted has a
> good chance of finding the "envelop and ready for data" sitting in
> an early queue and the large send can avoid any rendezvous delay.
> Data begins to flow immediately vs waiting for a round trip of a
> full rendezvous. In many cases we cut the delay in half and best
> case we eliminate rendezvous delay completely. If the receive side
> is late in posting the receive we still save a packet traversal but
> do not save any time.
>
> If there may be an MPI_ANY_SOURCE then this does not work because the
> receive side that has an MPI_ANY_SOURCE cannot guess which sender to
> notify so the sender cannot count on getting a 1/2 rendezvous
> notification for a message that should match the MPI_ANY_SOURCE
> receive.
>
> The problem that made me lower the priority is that many MPIs use an
> eager protocol for small messages and a rendezvous protocol for large
> messages. If the send side and receive side have the same size buffer
> then both sides can reach the same conclusion: eager vs 1/2 rendezvous.
> If both decide on eager, the receive side will not send an
> "envelop and ready for data" packet and the send side will not look
> for one. If both sides decide on 1/2 rendezvous then the receive side
> will send an "envelop and ready for data" packet and the send side will
> look for and consume the notice. If the send side is for an 8 byte
> message and the receive uses a "big enough" receive buffer of 64KB
> then the two sides will probably not be able to reach the same
> conclusion about the protocol. The receive side will ship off an
> "envelop and ready for data" packet that the send side will not
> know what to do with.
>
>
>> >
>> > Best regards.
>> >
>> > Alexander
>> >
>> > From: Supalov, Alexander
>> > Sent: Friday, June 20, 2008 8:29 AM
>> > To: 'MPI 3.0 Sub-setting working group'
>> > Subject: RE: [Mpi3-subsetting] MPI subsetting: charting the way
>> > forward at atelecon next week?
>
>> > Dear Dick,
>> >
>> > Thank you. I remember we exchanged a couple of emails about the
>> > possible extensions to the set of assertions, like one-sided and
>> > I/O, and in my recollection, almost reached an agreement that this
>> > can improve performance and possibly memory footprint, as well as be
>> > expressed thru assertions. Do you still feel favorable about this?
>> >
>> > Best regards.
>> >
>> > Alexander
>> >
>
>
>
> _______________________________________________
> mpi3-subsetting mailing list
> mpi3-subsetting_at_[hidden]
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpi3-subsetting/attachments/20080620/fc4f5ba9/attachment.html>
More information about the Mpi3-subsetting
mailing list