[Mpi3-subsetting] MPI subsetting: charting the way forward atatelecon next week?

Richard Graham rlgraham at [hidden]
Fri Jun 20 11:52:47 CDT 2008


I think we need to be careful here when it comes to assertions, and think
hard about how
 you want to handle these in a standard.  In some of the implementations I
am familiar with
 a no-eager-throttle key word would be useless ­ it is vey implementation
specific.  I suppose
 this is a big problem with trying to add implementation specific keywords
to a standard.
 It is a given that this will also cause trouble when trying to come up with
an ABI, unless
 one has a large set of defined constants, and are willing to have these be
no-ops in
 certain implementations.

Rich

On 6/20/08 9:56 AM, "Richard Treumann" <treumann_at_[hidden]> wrote:

> Hi Alexander
> 
> Comments imbedded below.
> 
> I have no objections to someone providing a rationale for assertions related
> to MPI-IO and MPI_1sided.  If the rationale is sound I have no objection to
> putting them in the proposal.
> 
> I feel the proposal should be evaluated by the following algorithm.
> 
> If (this concept  is one that seems plausible) {
>  for each proposed assertion {
>  if (rationale not solid)
>  discard
>  if (deal breaker downside)
>  discard
>  }
> if ((concept makes sense) & (set of worthwhile assertions is not empty))
>  make this part of MPI 2.2
> 
> I do not see much reason to get every assertion that eventually gains traction
> into MPI 2.2.  MPI 3.0 is soon enough for any that do not make the MPI 2.2
> cut. I do not want to see the concept fall because some particular assertion
> is controversial.
> 
> I consider MPI_NO_EAGER_THROTTLE to be the single most valuable assertion for
> MPI 2.2 because it is needed to allow MPI to scale to the levels we are
> already seeing.
>  
> 
> Dick Treumann  -  MPI Team/TCEM
> IBM Systems & Technology Group
> Dept 0lva / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
> Tele (845) 433-7846         Fax (845) 433-8363
> 
> 
> mpi3-subsetting-bounces_at_[hidden] wrote on 06/20/2008 02:58:41 AM:
> 
>> > Dear Dick,
>> >  
>> > A couple of suggestions re your proposal:
>> >  
>> > - If ASSERTIONS is put at the end of the MPI_INIT_ASSERTED argument
>> > list, in C++ one can declare the last argument as having a zero
>> > default value, and skip it if necessary. This might help with
>> > deprecation of the earlier MPI_INIT_* calls.
> 
> I have no objection. It seems reasonable to let C++ default the
> assertions parameter to "none"
> 
>> > - In non-Cray parts of the world, an MPI_INT followed by MPI_FLOAT
>> > is likely to be a 4-byte int followed by a 4-byte float. This
>> > sometimes depends on the compiler settings in effect, too.
> 
> My rationale is not specific to any particular architecture.
> Some MPI datatypes are made entirely
> from the same base type. Some are mixtures of types. If libmpi knows
> at the moment a datatype is committed that the send side and receive
> side will always use the same internal representions then it does not
> need to keep track of the fact that one instance of {MPI_INT,MPI_FLOAT}
> has two distinct parts. The send side can gather and ship 8 bytes
> and the receive side can scatter the 8 bytes. If one side might use 4
> byte integers while the other side uses 8 byte integers then at
> least one side will need to know there is a conversion to be done for
> the MPI_INT part. If an MPI job does a spawn or join that links to a
> different architecture after the datatype has been committed, and
> the MPI_Type_commit has discarded the details, it is too late to get
> them back.  On the other hand, if it is known there will never be a
> different architecture added to the job, the extra information can be
> safely discarded.
> 
>> > - I don't think MPI_NO_THREAD_CONTENTION is really necessary. The
>> > original thread level settings, in particular, the use of anything
>> > but MPI_THREAD_MULTIPLE, seem to capture the semantics that you proposed.
> 
> This one is kind of tricky and I also am not sure what it would mean. If
> we find a clear value we can keep it and if not we can remove it.
> 
>> > - I can't fully follow the motivation for MPI_NO_ANY_SOURCE
>> > deprioritization. AFAIK, a rendezvous exchange usually starts with a
>> > ready-to-send packet that contains the size of the message. In this
>> > case the receiving side will normally reply with a ready-to-receive
>> > regardless of the buffer space available, and flag MPI_ERR_TRUNCATED
>> > on message arrival if necessary. In this case, neither
>> > MPI_ANY_SOURCE not MPI_NO_ANY_SOURCE seem to get into way.
> 
> My point is that MPI_NO_ANY_SOURCE might allow this round trip
> protocol to be replaced by a 1/2 rendezvous protocol. If it is known
> that MPI_ANY_SOURCE will not be used then the receive side can send
> an "envelop and ready for data" packet to the send side. As long as
> the send side knows it will receive the "envelop and ready for data"
> packet when the receive is posted, it does not need to do the first 1/2
> of the rendezvous. The message matching can be done at the send side.
> 
> A send for which the receive was preposted has a
> good chance of finding the "envelop and ready for data" sitting in
> an early queue and the large send can avoid any rendezvous delay.
> Data begins to flow immediately vs waiting for a round trip of a
> full rendezvous. In many cases we cut the delay in half and best
> case we eliminate rendezvous delay completely. If the receive side
> is late in posting the receive we still save a packet traversal but
> do not save any time.
> 
> If there may be an MPI_ANY_SOURCE then this does not work because the
> receive side that has an MPI_ANY_SOURCE cannot guess which sender to
> notify so the sender cannot count on getting a 1/2 rendezvous
> notification for a message that should match the MPI_ANY_SOURCE
> receive.
> 
> The problem that made me lower the priority is that many MPIs use an
> eager protocol for small messages and a rendezvous protocol for large
> messages.  If the send side and receive side have the same size buffer
> then both sides can reach the same conclusion: eager vs 1/2 rendezvous.
> If both decide on eager, the receive side will not send an
> "envelop and ready for data" packet and the send side will not look
> for one. If both sides decide on 1/2 rendezvous then the receive side
> will send an "envelop and ready for data" packet and the send side will
> look for and consume the notice.  If the send side is for an 8 byte
> message and the receive uses a "big enough" receive buffer of 64KB
> then the two sides will probably not be able to reach the same
> conclusion about the protocol. The receive side will ship off an
> "envelop and ready for data" packet that the send side will not
> know what to do with.
>  
> 
>> >  
>> > Best regards.
>> >  
>> > Alexander
>> >  
>> > From: Supalov, Alexander
>> > Sent: Friday, June 20, 2008 8:29 AM
>> > To: 'MPI 3.0 Sub-setting working group'
>> > Subject: RE: [Mpi3-subsetting] MPI subsetting: charting the way
>> > forward at atelecon next week?
> 
>> > Dear Dick,
>> >  
>> > Thank you. I remember we exchanged a couple of emails about the
>> > possible extensions to the set of assertions, like one-sided and
>> > I/O, and in my recollection, almost reached an agreement that this
>> > can improve performance and possibly memory footprint, as well as be
>> > expressed thru assertions. Do you still feel favorable about this?
>> >  
>> > Best regards.
>> >  
>> > Alexander
>> > 
> 
> 
> 
> _______________________________________________
> mpi3-subsetting mailing list
> mpi3-subsetting_at_[hidden]
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting



* 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpi3-subsetting/attachments/20080620/fc4f5ba9/attachment.html>


More information about the Mpi3-subsetting mailing list