[Mpi3-subsetting] MPI subsetting: charting the way forward atatelecon next week?

Supalov, Alexander alexander.supalov at [hidden]
Fri Jun 20 12:12:49 CDT 2008


Thanks. If you look into Dick's proposal, you'll find just a handful of
assertions. A 32-bit int is not large enough for hundreds of assertions
anyway.

________________________________

From: mpi3-subsetting-bounces_at_[hidden]
[mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of Martin
Schulz
Sent: Friday, June 20, 2008 7:11 PM
To: MPI 3.0 Sub-setting working group
Subject: Re: [Mpi3-subsetting] MPI subsetting: charting the way forward
atatelecon next week?

At 09:58 AM 6/20/2008, Supalov, Alexander wrote:

        Content-class: urn:content-classes:message
        Content-Type: multipart/alternative;
                 boundary="----_=_NextPart_001_01C8D2F6.CE1CA5E4"
        
        Hi,
         
        Ignoring an assertion should be perfectly legal.

I fully agree, ignoring should always be OK, which ensures
the portability of any application using assertions.

However, I do see Rich's point - how useful are assertions
if we have hundreds of them and each just works on a particular
MPI implementation (or even version)? Also, if these constants
are really implementation specific, does it make sense to have
them in the MPI standard? Each vender will want their own set
(and rightfully so) and the burden is then on the programmer to
know all of the different options and understand the subtle
differences (and we have to document them all in the standard).

Perhaps we should just define broad groups of assertions and 
define those in the standard. The user can then query for all 
available assertions in that group for a particular implementation. 
This would have to coupled with an ability to uniquely identify 
certain MPI implementations at runtime. Also, this does not
solve the problem for the end-user of how to select the correct
assertion.

Martin

Martin

        Best regards.
         
        Alexander
        
        
________________________________

        From: mpi3-subsetting-bounces_at_[hidden] [
mailto:mpi3-subsetting-bounces_at_[hidden]
<mailto:mpi3-subsetting-bounces_at_[hidden]> ] On Behalf Of
Richard Graham
        Sent: Friday, June 20, 2008 6:53 PM
        To: MPI 3.0 Sub-setting working group
        Subject: Re: [Mpi3-subsetting] MPI subsetting: charting the way
forward atatelecon next week?
        
        I think we need to be careful here when it comes to assertions,
and think hard about how
         you want to handle these in a standard.  In some of the
implementations I am familiar with
         a no-eager-throttle key word would be useless - it is vey
implementation specific.  I suppose
         this is a big problem with trying to add implementation
specific keywords to a standard.
         It is a given that this will also cause trouble when trying to
come up with an ABI, unless
         one has a large set of defined constants, and are willing to
have these be no-ops in
         certain implementations.
        
        Rich
        
        
        On 6/20/08 9:56 AM, "Richard Treumann" <treumann_at_[hidden]>
wrote:
        
        

                Hi Alexander
                
                
                Comments imbedded below.
                
                
                I have no objections to someone providing a rationale
for assertions related to MPI-IO and MPI_1sided.  If the rationale is
sound I have no objection to putting them in the proposal. 
                
                
                I feel the proposal should be evaluated by the following
algorithm.
                
                
                If (this concept  is one that seems plausible) {
                
                 for each proposed assertion {
                
                 if (rationale not solid) 
                
                 discard
                
                 if (deal breaker downside) 
                
                 discard
                
                 }
                
                if ((concept makes sense) & (set of worthwhile
assertions is not empty))
                
                 make this part of MPI 2.2
                
                
                I do not see much reason to get every assertion that
eventually gains traction into MPI 2.2.  MPI 3.0 is soon enough for any
that do not make the MPI 2.2 cut. I do not want to see the concept fall
because some particular assertion is controversial. 
                
                
                I consider MPI_NO_EAGER_THROTTLE to be the single most
valuable assertion for MPI 2.2 because it is needed to allow MPI to
scale to the levels we are already seeing.
                
                
                
                  
                Dick Treumann  -  MPI Team/TCEM            
                
                IBM Systems & Technology Group
                
                Dept 0lva / MS P963 -- 2455 South Road -- Poughkeepsie,
NY 12601
                
                Tele (845) 433-7846         Fax (845) 433-8363
                
                
                
                mpi3-subsetting-bounces_at_[hidden] wrote on
06/20/2008 02:58:41 AM:
                
                
		> Dear Dick,
                
		>  
                
		> A couple of suggestions re your proposal:
                
		>  
                
		> - If ASSERTIONS is put at the end of the
MPI_INIT_ASSERTED argument 
                
		> list, in C++ one can declare the last argument as
having a zero 
                
		> default value, and skip it if necessary. This might
help with 
                
		> deprecation of the earlier MPI_INIT_* calls.
                
                
                I have no objection. It seems reasonable to let C++
default the 
                
                assertions parameter to "none"
                
                
		> - In non-Cray parts of the world, an MPI_INT followed
by MPI_FLOAT 
                
		> is likely to be a 4-byte int followed by a 4-byte
float. This 
                
		> sometimes depends on the compiler settings in effect,
too.
                
                
                My rationale is not specific to any particular
architecture. 
                
                Some MPI datatypes are made entirely 
                
                from the same base type. Some are mixtures of types. If
libmpi knows 
                
                at the moment a datatype is committed that the send side
and receive
                
                side will always use the same internal representions
then it does not 
                
                need to keep track of the fact that one instance of
{MPI_INT,MPI_FLOAT}
                
                has two distinct parts. The send side can gather and
ship 8 bytes 
                
                and the receive side can scatter the 8 bytes. If one
side might use 4
                
                byte integers while the other side uses 8 byte integers
then at 
                
                least one side will need to know there is a conversion
to be done for 
                
                the MPI_INT part. If an MPI job does a spawn or join
that links to a
                
                different architecture after the datatype has been
committed, and
                
                the MPI_Type_commit has discarded the details, it is too
late to get 
                
                them back.  On the other hand, if it is known there will
never be a
                
                different architecture added to the job, the extra
information can be
                
                safely discarded.
                
                
		> - I don't think MPI_NO_THREAD_CONTENTION is really
necessary. The 
                
		> original thread level settings, in particular, the use
of anything 
                
		> but MPI_THREAD_MULTIPLE, seem to capture the semantics
that you proposed.
                
                
                This one is kind of tricky and I also am not sure what
it would mean. If
                
                we find a clear value we can keep it and if not we can
remove it.
                
                
		> - I can't fully follow the motivation for
MPI_NO_ANY_SOURCE 
                
		> deprioritization. AFAIK, a rendezvous exchange usually
starts with a
                
		> ready-to-send packet that contains the size of the
message. In this 
                
		> case the receiving side will normally reply with a
ready-to-receive 
                
		> regardless of the buffer space available, and flag
MPI_ERR_TRUNCATED
                
		> on message arrival if necessary. In this case, neither

                
		> MPI_ANY_SOURCE not MPI_NO_ANY_SOURCE seem to get into
way.
                
                
                My point is that MPI_NO_ANY_SOURCE might allow this
round trip 
                
                protocol to be replaced by a 1/2 rendezvous protocol. If
it is known
                
                that MPI_ANY_SOURCE will not be used then the receive
side can send
                
                an "envelop and ready for data" packet to the send side.
As long as 
                
                the send side knows it will receive the "envelop and
ready for data" 
                
                packet when the receive is posted, it does not need to
do the first 1/2
                
                of the rendezvous. The message matching can be done at
the send side.
                
                
                A send for which the receive was preposted has a 
                
                good chance of finding the "envelop and ready for data"
sitting in 
                
                an early queue and the large send can avoid any
rendezvous delay.
                
                Data begins to flow immediately vs waiting for a round
trip of a 
                
                full rendezvous. In many cases we cut the delay in half
and best 
                
                case we eliminate rendezvous delay completely. If the
receive side 
                
                is late in posting the receive we still save a packet
traversal but
                
                do not save any time.
                
                
                If there may be an MPI_ANY_SOURCE then this does not
work because the
                
                receive side that has an MPI_ANY_SOURCE cannot guess
which sender to 
                
                notify so the sender cannot count on getting a 1/2
rendezvous 
                
                notification for a message that should match the
MPI_ANY_SOURCE 
                
                receive.
                
                
                The problem that made me lower the priority is that many
MPIs use an
                
                eager protocol for small messages and a rendezvous
protocol for large
                
                messages.  If the send side and receive side have the
same size buffer
                
                then both sides can reach the same conclusion: eager vs
1/2 rendezvous.
                
                If both decide on eager, the receive side will not send
an
                
                "envelop and ready for data" packet and the send side
will not look 
                
                for one. If both sides decide on 1/2 rendezvous then the
receive side
                
                will send an "envelop and ready for data" packet and the
send side will
                
                look for and consume the notice.  If the send side is
for an 8 byte 
                
                message and the receive uses a "big enough" receive
buffer of 64KB 
                
                then the two sides will probably not be able to reach
the same 
                
                conclusion about the protocol. The receive side will
ship off an
                
                "envelop and ready for data" packet that the send side
will not 
                
                know what to do with.
                
                
                
                  
		>  
                
		> Best regards.
                
		>  
                
		> Alexander
                
		>  
                
		> From: Supalov, Alexander 
                
		> Sent: Friday, June 20, 2008 8:29 AM
                
		> To: 'MPI 3.0 Sub-setting working group'
                
		> Subject: RE: [Mpi3-subsetting] MPI subsetting:
charting the way 
                
		> forward at atelecon next week?
                
                
		> Dear Dick,
                
		>  
                
		> Thank you. I remember we exchanged a couple of emails
about the 
                
		> possible extensions to the set of assertions, like
one-sided and 
                
		> I/O, and in my recollection, almost reached an
agreement that this 
                
		> can improve performance and possibly memory footprint,
as well as be
                
		> expressed thru assertions. Do you still feel favorable
about this?
                
		>  
                
		> Best regards.
                
		>  
                
		> Alexander
                
		> 
                
                
                
                
                _______________________________________________
                
                mpi3-subsetting mailing list
                
                mpi3-subsetting_at_[hidden]
                
        
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
                
                

        
        
        
        
---------------------------------------------------------------------
        Intel GmbH
        Dornacher Strasse 1
        85622 Feldkirchen/Muenchen Germany
        Sitz der Gesellschaft: Feldkirchen bei Muenchen
        Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes
Schwaderer
        Registergericht: Muenchen HRB 47456 Ust.-IdNr.
        VAT Registration No.: DE129385895
        Citibank Frankfurt (BLZ 502 109 00) 600119052
        
        This e-mail and any attachments may contain confidential
material for
        the sole use of the intended recipient(s). Any review or
distribution
        by others is strictly prohibited. If you are not the intended
        recipient, please contact the sender and delete all copies.

        _______________________________________________
        mpi3-subsetting mailing list
        mpi3-subsetting_at_[hidden]
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
<http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting>  

_______________________________________________________________________
Martin Schulz, schulzm_at_[hidden], http://people.llnl.gov/schulz6
CASC @ Lawrence Livermore National Laboratory, Livermore, USA 

---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.





* 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpi3-subsetting/attachments/20080620/eaafdf4e/attachment.html>


More information about the Mpi3-subsetting mailing list