[Mpi3-subsetting] agenda for subsetting kickoff telecon ww09

Supalov, Alexander alexander.supalov at [hidden]
Fri Feb 29 07:38:48 CST 2008



OK, thanks. 

-----Original Message-----
From: Bronis R. de Supinski [mailto:bronis_at_[hidden]] 
Sent: Friday, February 29, 2008 2:17 PM
To: Supalov, Alexander
Cc: mpi3-subsetting_at_[hidden]
Subject: RE: [Mpi3-subsetting] agenda for subsetting kickoff telecon
ww09

Alexander:

It is the vast majority of scientific applications.
It is not just ones that need sparse data structures.
A stencil application that uses dense matrices has
strided non-contiguous data transfers for half (2D)
or more (3D or more complex stencils) of its
communication. Non-contiguous communication is
the reality of distributed memory computing...

I am fine with letting this rest but my point is
that an emphasis on performance by implementers
should be the case for datatypes...

Bronis

On Fri, 29 Feb 2008, Supalov, Alexander wrote:

> Dear Bronis,
>
> Thanks. What scientific computing codes do you mean here - chemistry,
> structural mechanics, fluid dynamics, genomics, something else? Or do
> you speak generally of any code that needs sparse data structures? If
> so, what's your estimate of the relative number of such codes compared
> to those that do not need sparse datatypes? In what domain?
>
> The right doze of vendor motivation not to do wrong things is a good
> point, I'll consider it.
>
> Finally, the constant stride copying was but an example when inlining
> may help to users achieve higher performance. There may be other
> examples known in the scientific computing area. However, since
> performance is not primary goal for datatypes, I suggest we let this
> matter rest for a while.
>
> Best regards.
>
> Alexander
>
> -----Original Message-----
> From: Bronis R. de Supinski [mailto:bronis_at_[hidden]]
> Sent: Friday, February 29, 2008 10:11 AM
> To: Supalov, Alexander
> Cc: mpi3-subsetting_at_[hidden]
> Subject: RE: [Mpi3-subsetting] agenda for subsetting kickoff telecon
> ww09
>
>
> Alexander:
>
> Re:
> > Thanks. I understand your motivation. When you say "most real
> > applications" - what applications do you mean? At least, in what
area?
>
> ? Scientific computing...
>
> > For the NIC part, the stress was on "here". In my opinion,
subsetting
> is
> > not about making things more complicated, more challenging to the
> > implementors, or to the underlying hardware. It's about making
things
> > simple, easy to use, and easy to implement - including
implementation
> of
> > only those features your users actually need. That the
implementation
> > may be faster due to this is an added bonus, not the primary goal.
>
> The emphasis here should not be on creating a disincentive
> for vendors to do the right thing...
>
> > Still, regarding user side copying. Yes, when people do this one
> wonders
> > why. There's a reason, apart from them: 1) not caring about
datatypes
> > and their complexity and 2) not trusting their performance. A modern
> > compiler can rather well optimize a loop with a constant stride, and
> may
> > have difficulty with an unknown stride. This is why explicit loops
are
> > sometimes indeed faster (much faster) in the resulting code than any
> > generic implementation.
>
> Huh? What makes you think the user copying code is
> in terms of constant stride? Generally, it varies
> with the input. We are not talking about a simple
> situation to optimize at the user level...
>
> Bronis
>
>
> >
> > Best regards.
> >
> > Alexander
> >
> > -----Original Message-----
> > From: Bronis R. de Supinski [mailto:bronis_at_[hidden]]
> > Sent: Friday, February 29, 2008 6:20 AM
> > To: Supalov, Alexander
> > Cc: mpi3-subsetting_at_[hidden]
> > Subject: RE: [Mpi3-subsetting] agenda for subsetting kickoff telecon
> > ww09
> >
> >
> > Alexander:
> >
> > Most real applications need to send non-contiguous
> > data. If they do not use datatypes then they are
> > doing the equivalent of either the packing/unpacking
> > or smaller messages at the user level. This s hould
> > be discouraged, not encouraged. A small savings
> > in library object size is not ample reason to go
> > against that. And, yes, we are after encouraging
> > hardware vendors to provide the right hardware.
> >
> > Bronis
> >
> >
> > On Fri, 29 Feb 2008, Supalov, Alexander wrote:
> >
> > > Hi,
> > >
> > > Thanks. I think the main thrust here is the library footprint (no
> > > pack/unpack, etc.) and complexity of the user side of the datatype
> > > interface, rather than performance. Many applications just don't
> need
> > > any of this, and never will. Why not translating this application
> > > non-requirement into a minimum MPI subset? Same with
> > communicator/group
> > > management, etc.
> > >
> > > Moreover, homogeneous installations that dominate HPC now don't
> > actually
> > > need any datatype support at all. They send chunks of bytes. This
> may
> > > change in the future, though.
> > >
> > > A minor performance implication is that without holes that are
only
> > > possible with derived datatypes, one does not need to track this,
> > split
> > > the critical path, and make special provisions inside the MPI
device
> > > layer to handle iov or such.
> > >
> > > The NIC capability argument is interesting, but it turns the
> > discussion
> > > on its head: we're not after motivating network vendors to provide
> > > scatter/gather in hardware here, are we? Please clarify.
> > >
> > > Best regards.
> > >
> > > Alexander
> > >
> > > -----Original Message-----
> > > From: mpi3-subsetting-bounces_at_[hidden]
> > > [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of
> > Bronis
> > > R. de Supinski
> > > Sent: Friday, February 29, 2008 5:53 AM
> > > To: mpi3-subsetting_at_[hidden]
> > > Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff
telecon
> > > ww09
> > >
> > >
> > > All:
> > >
> > > OK, I have to respond to the notion that derived datatypes
> > > limit performance. It is just not a reasonable position.
> > >
> > > Sure, if you can send contiguous locations, you will get
> > > higher performance. The problem is that codes do not only
> > > need to send contiguous data so that is not an adequate
> > > reason to say derived datatypes limit performance.
> > >
> > > So, what is left? That there is some more efficient way
> > > to send non-contiguous data? How? As multiple messages,
> > > each of which send contiguous data? If so, then the
> > > implementation could do this under the covers and the
> > > datatypes are just a convenience for the user not to
> > > have to specify the individual sends. OK, suppose that's
> > > not the reason. Perhaps the user can do the copying into
> > > a contiguous buffer and get better performance? While
> > > I have seen this hold with some implementations, it is
> > > absurd. There is no reason that I can discern as to why
> > > the user should be able to deduce a better copying
> > > mechanism than the MPI implementer. So, again, at worst,
> > > the datatypes should be a convenience. Do you have an
> > > alternative reason or a refutation of these opinions?
> > >
> > > What is more important, it is certainly possible to build
> > > scatter/gather support into a NIC and achieve better
> > > performance with datatypes than without. While there are
> > > issues to be resolved for that (primarily the issue of
> > > pinning memory), they are solvable with the right hardware
> > > mechanism. Just because it does not yet exist is not
> > > an adequate reason to say "Get rid of datatypes". OK,
> > > you are not saying that but you are saying to deprecate
> > > them in a sense. And saying you could send contiguous
> > > sends more efficiently is a bad argument here. How do
> > > datatypes cause inefficiency for that? How much is
> > > that cost really? At what point do you hit where the
> > > answer is "It would be faster not to compute anything"?
> > >
> > > Bronis
> > >
> > >
> > > On Fri, 29 Feb 2008, Supalov, Alexander wrote:
> > >
> > > > Hi,
> > > >
> > > > Thanks. What subsets inside the current standard would you
> propose?
> > > What
> > > > interfaces between them would you envision?
> > > >
> > > > Good idea about the optimization opportunities. Here's an
initial
> > > > combined list, with the main benefits as I see them. Please
> > > > comment/extend.
> > > >
> > > > - Dynamic process support: less overhead in the progress engine,
> > > easier
> > > > global rank handling.
> > > > - Heterogeneity: better memory footprint, easier data handling.
> > > > - Derived datatypes (especially those with holes): better memory
> > > > footprint.
> > > > - MPI_ANY_SOURCE: faster, more simple multifabric progress.
> > > > - File I/O: smaller requests, easier wait/test functions.
> > > > - One-sided ops: no passive target w/o MPI calls - no extra
> progress
> > > > thread.
> > > > - Communicator & group management: better memory footprint.
> > > > - Message tagging: better support for stable dataflow exchanges,
> > > smaller
> > > > packets.
> > > > - Non-blocking communication: easier ordering, simplified
request
> > > > handling.
> > > >
> > > > Best regards.
> > > >
> > > > Alexander
> > > >
> > > > -----Original Message-----
> > > > From: mpi3-subsetting-bounces_at_[hidden]
> > > > [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf
Of
> > > > Torsten Hoefler
> > > > Sent: Friday, February 29, 2008 5:08 AM
> > > > To: mpi3-subsetting_at_[hidden]
> > > > Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff
> telecon
> > > > ww09
> > > >
> > > > Hi,
> > > > >    Present: Leonid Meyerguz (Microsoft), Rich Graham (ORNL),
> > Richard
> > > > Barrett
> > > > >    (ORNL), Torsten Hoefler (ISU), Alexander Supalov (Intel)
> > > > just for the record, it's "IU" not "ISU" :-)
> > > >
> > > > >    - Scope of the effort
> > > > >      - Rich
> > > > >        - Minimum subset consistent with the rest of MPI, for
> > > > >    performance/memory footprint optimization
> > > > >        - Danger of splitting MPI, hence against optional
> features
> > in
> > > > the
> > > > >    standard
> > > > I back that (danger of optional features for portability). I'd
> > propose
> > > > to split the current standard into mostly self-contained subsets
> > that
> > > > have clearly defined interfaces to the rest of the standard.
Note:
> > > this
> > > > only defines logical interfaces, that does *not* define how
those
> > > things
> > > > are to be implemented. This makes it easier to understand the
> > standard
> > > > and have separate (portable) libraries for the subsets, it does
> not
> > > > influence optimization possibilities by implementing everything
in
> a
> > > > monolithic block (i.e., central progress).
> > > >
> > > > >        - Both blocking & nonblocking belong to the core
> > > > >      - Torsten
> > > > >        - Some collectives may go into selectable subsets
> > > > I see three subsets: blocking colls, non-blocking colls and
> > > topological
> > > > colls (maybe also blocking / non-blocking).
> > > >
> > > > >        - MPI_ANY_SOURCE considered harmful
> > > > I'd like to add datatypes and heterogeneity to this list (with
> > regards
> > > > to performance). Alexander mentioned the dynamics. I think we
> should
> > > > have a lit of items ready that could influence optimization
> > > > possibilities significanty if they were to be announced by the
> user
> > > > before he can use them. That would give another strong argument
> for
> > > the
> > > > subsetting.
> > > >
> > > > Best,
> > > >   Torsten
> > > >
> > > > --
> > > >  bash$ :(){ :|:&};: --------------------- http://www.unixer.de/
> > -----
> > > > Indiana University    | http://www.indiana.edu
> > > > Open Systems Lab      | http://osl.iu.edu/
> > > > 150 S. Woodlawn Ave.  | Bloomington, IN, 474045-7104 | USA
> > > > Lindley Hall Room 135 | +01 (812) 855-3608
> > > > _______________________________________________
> > > > Mpi3-subsetting mailing list
> > > > Mpi3-subsetting_at_[hidden]
> > > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
> > > >
> >
---------------------------------------------------------------------
> > > > Intel GmbH
> > > > Dornacher Strasse 1
> > > > 85622 Feldkirchen/Muenchen Germany
> > > > Sitz der Gesellschaft: Feldkirchen bei Muenchen
> > > > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes
> Schwaderer
> > > > Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> > > > VAT Registration No.: DE129385895
> > > > Citibank Frankfurt (BLZ 502 109 00) 600119052
> > > >
> > > > This e-mail and any attachments may contain confidential
material
> > for
> > > > the sole use of the intended recipient(s). Any review or
> > distribution
> > > > by others is strictly prohibited. If you are not the intended
> > > > recipient, please contact the sender and delete all copies.
> > > >
> > > >
> > > > _______________________________________________
> > > > Mpi3-subsetting mailing list
> > > > Mpi3-subsetting_at_[hidden]
> > > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
> > > >
> > > _______________________________________________
> > > Mpi3-subsetting mailing list
> > > Mpi3-subsetting_at_[hidden]
> > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
> > >
> ---------------------------------------------------------------------
> > > Intel GmbH
> > > Dornacher Strasse 1
> > > 85622 Feldkirchen/Muenchen Germany
> > > Sitz der Gesellschaft: Feldkirchen bei Muenchen
> > > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes
Schwaderer
> > > Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> > > VAT Registration No.: DE129385895
> > > Citibank Frankfurt (BLZ 502 109 00) 600119052
> > >
> > > This e-mail and any attachments may contain confidential material
> for
> > > the sole use of the intended recipient(s). Any review or
> distribution
> > > by others is strictly prohibited. If you are not the intended
> > > recipient, please contact the sender and delete all copies.
> > >
> > >
> >
---------------------------------------------------------------------
> > Intel GmbH
> > Dornacher Strasse 1
> > 85622 Feldkirchen/Muenchen Germany
> > Sitz der Gesellschaft: Feldkirchen bei Muenchen
> > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> > Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> > VAT Registration No.: DE129385895
> > Citibank Frankfurt (BLZ 502 109 00) 600119052
> >
> > This e-mail and any attachments may contain confidential material
for
> > the sole use of the intended recipient(s). Any review or
distribution
> > by others is strictly prohibited. If you are not the intended
> > recipient, please contact the sender and delete all copies.
> >
> >
> ---------------------------------------------------------------------
> Intel GmbH
> Dornacher Strasse 1
> 85622 Feldkirchen/Muenchen Germany
> Sitz der Gesellschaft: Feldkirchen bei Muenchen
> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> VAT Registration No.: DE129385895
> Citibank Frankfurt (BLZ 502 109 00) 600119052
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
>
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.



More information about the Mpi3-subsetting mailing list