[Mpi3-subsetting] agenda for subsetting kickoff telecon ww09

Supalov, Alexander alexander.supalov at [hidden]
Fri Feb 29 05:39:40 CST 2008



Dear Bronis,

Thanks. What scientific computing codes do you mean here - chemistry,
structural mechanics, fluid dynamics, genomics, something else? Or do
you speak generally of any code that needs sparse data structures? If
so, what's your estimate of the relative number of such codes compared
to those that do not need sparse datatypes? In what domain?

The right doze of vendor motivation not to do wrong things is a good
point, I'll consider it.

Finally, the constant stride copying was but an example when inlining
may help to users achieve higher performance. There may be other
examples known in the scientific computing area. However, since
performance is not primary goal for datatypes, I suggest we let this
matter rest for a while.

Best regards.

Alexander 

-----Original Message-----
From: Bronis R. de Supinski [mailto:bronis_at_[hidden]] 
Sent: Friday, February 29, 2008 10:11 AM
To: Supalov, Alexander
Cc: mpi3-subsetting_at_[hidden]
Subject: RE: [Mpi3-subsetting] agenda for subsetting kickoff telecon
ww09

Alexander:

Re:
> Thanks. I understand your motivation. When you say "most real
> applications" - what applications do you mean? At least, in what area?

? Scientific computing...

> For the NIC part, the stress was on "here". In my opinion, subsetting
is
> not about making things more complicated, more challenging to the
> implementors, or to the underlying hardware. It's about making things
> simple, easy to use, and easy to implement - including implementation
of
> only those features your users actually need. That the implementation
> may be faster due to this is an added bonus, not the primary goal.

The emphasis here should not be on creating a disincentive
for vendors to do the right thing...

> Still, regarding user side copying. Yes, when people do this one
wonders
> why. There's a reason, apart from them: 1) not caring about datatypes
> and their complexity and 2) not trusting their performance. A modern
> compiler can rather well optimize a loop with a constant stride, and
may
> have difficulty with an unknown stride. This is why explicit loops are
> sometimes indeed faster (much faster) in the resulting code than any
> generic implementation.

Huh? What makes you think the user copying code is
in terms of constant stride? Generally, it varies
with the input. We are not talking about a simple
situation to optimize at the user level...

Bronis

>
> Best regards.
>
> Alexander
>
> -----Original Message-----
> From: Bronis R. de Supinski [mailto:bronis_at_[hidden]]
> Sent: Friday, February 29, 2008 6:20 AM
> To: Supalov, Alexander
> Cc: mpi3-subsetting_at_[hidden]
> Subject: RE: [Mpi3-subsetting] agenda for subsetting kickoff telecon
> ww09
>
>
> Alexander:
>
> Most real applications need to send non-contiguous
> data. If they do not use datatypes then they are
> doing the equivalent of either the packing/unpacking
> or smaller messages at the user level. This s hould
> be discouraged, not encouraged. A small savings
> in library object size is not ample reason to go
> against that. And, yes, we are after encouraging
> hardware vendors to provide the right hardware.
>
> Bronis
>
>
> On Fri, 29 Feb 2008, Supalov, Alexander wrote:
>
> > Hi,
> >
> > Thanks. I think the main thrust here is the library footprint (no
> > pack/unpack, etc.) and complexity of the user side of the datatype
> > interface, rather than performance. Many applications just don't
need
> > any of this, and never will. Why not translating this application
> > non-requirement into a minimum MPI subset? Same with
> communicator/group
> > management, etc.
> >
> > Moreover, homogeneous installations that dominate HPC now don't
> actually
> > need any datatype support at all. They send chunks of bytes. This
may
> > change in the future, though.
> >
> > A minor performance implication is that without holes that are only
> > possible with derived datatypes, one does not need to track this,
> split
> > the critical path, and make special provisions inside the MPI device
> > layer to handle iov or such.
> >
> > The NIC capability argument is interesting, but it turns the
> discussion
> > on its head: we're not after motivating network vendors to provide
> > scatter/gather in hardware here, are we? Please clarify.
> >
> > Best regards.
> >
> > Alexander
> >
> > -----Original Message-----
> > From: mpi3-subsetting-bounces_at_[hidden]
> > [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of
> Bronis
> > R. de Supinski
> > Sent: Friday, February 29, 2008 5:53 AM
> > To: mpi3-subsetting_at_[hidden]
> > Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff telecon
> > ww09
> >
> >
> > All:
> >
> > OK, I have to respond to the notion that derived datatypes
> > limit performance. It is just not a reasonable position.
> >
> > Sure, if you can send contiguous locations, you will get
> > higher performance. The problem is that codes do not only
> > need to send contiguous data so that is not an adequate
> > reason to say derived datatypes limit performance.
> >
> > So, what is left? That there is some more efficient way
> > to send non-contiguous data? How? As multiple messages,
> > each of which send contiguous data? If so, then the
> > implementation could do this under the covers and the
> > datatypes are just a convenience for the user not to
> > have to specify the individual sends. OK, suppose that's
> > not the reason. Perhaps the user can do the copying into
> > a contiguous buffer and get better performance? While
> > I have seen this hold with some implementations, it is
> > absurd. There is no reason that I can discern as to why
> > the user should be able to deduce a better copying
> > mechanism than the MPI implementer. So, again, at worst,
> > the datatypes should be a convenience. Do you have an
> > alternative reason or a refutation of these opinions?
> >
> > What is more important, it is certainly possible to build
> > scatter/gather support into a NIC and achieve better
> > performance with datatypes than without. While there are
> > issues to be resolved for that (primarily the issue of
> > pinning memory), they are solvable with the right hardware
> > mechanism. Just because it does not yet exist is not
> > an adequate reason to say "Get rid of datatypes". OK,
> > you are not saying that but you are saying to deprecate
> > them in a sense. And saying you could send contiguous
> > sends more efficiently is a bad argument here. How do
> > datatypes cause inefficiency for that? How much is
> > that cost really? At what point do you hit where the
> > answer is "It would be faster not to compute anything"?
> >
> > Bronis
> >
> >
> > On Fri, 29 Feb 2008, Supalov, Alexander wrote:
> >
> > > Hi,
> > >
> > > Thanks. What subsets inside the current standard would you
propose?
> > What
> > > interfaces between them would you envision?
> > >
> > > Good idea about the optimization opportunities. Here's an initial
> > > combined list, with the main benefits as I see them. Please
> > > comment/extend.
> > >
> > > - Dynamic process support: less overhead in the progress engine,
> > easier
> > > global rank handling.
> > > - Heterogeneity: better memory footprint, easier data handling.
> > > - Derived datatypes (especially those with holes): better memory
> > > footprint.
> > > - MPI_ANY_SOURCE: faster, more simple multifabric progress.
> > > - File I/O: smaller requests, easier wait/test functions.
> > > - One-sided ops: no passive target w/o MPI calls - no extra
progress
> > > thread.
> > > - Communicator & group management: better memory footprint.
> > > - Message tagging: better support for stable dataflow exchanges,
> > smaller
> > > packets.
> > > - Non-blocking communication: easier ordering, simplified request
> > > handling.
> > >
> > > Best regards.
> > >
> > > Alexander
> > >
> > > -----Original Message-----
> > > From: mpi3-subsetting-bounces_at_[hidden]
> > > [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of
> > > Torsten Hoefler
> > > Sent: Friday, February 29, 2008 5:08 AM
> > > To: mpi3-subsetting_at_[hidden]
> > > Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff
telecon
> > > ww09
> > >
> > > Hi,
> > > >    Present: Leonid Meyerguz (Microsoft), Rich Graham (ORNL),
> Richard
> > > Barrett
> > > >    (ORNL), Torsten Hoefler (ISU), Alexander Supalov (Intel)
> > > just for the record, it's "IU" not "ISU" :-)
> > >
> > > >    - Scope of the effort
> > > >      - Rich
> > > >        - Minimum subset consistent with the rest of MPI, for
> > > >    performance/memory footprint optimization
> > > >        - Danger of splitting MPI, hence against optional
features
> in
> > > the
> > > >    standard
> > > I back that (danger of optional features for portability). I'd
> propose
> > > to split the current standard into mostly self-contained subsets
> that
> > > have clearly defined interfaces to the rest of the standard. Note:
> > this
> > > only defines logical interfaces, that does *not* define how those
> > things
> > > are to be implemented. This makes it easier to understand the
> standard
> > > and have separate (portable) libraries for the subsets, it does
not
> > > influence optimization possibilities by implementing everything in
a
> > > monolithic block (i.e., central progress).
> > >
> > > >        - Both blocking & nonblocking belong to the core
> > > >      - Torsten
> > > >        - Some collectives may go into selectable subsets
> > > I see three subsets: blocking colls, non-blocking colls and
> > topological
> > > colls (maybe also blocking / non-blocking).
> > >
> > > >        - MPI_ANY_SOURCE considered harmful
> > > I'd like to add datatypes and heterogeneity to this list (with
> regards
> > > to performance). Alexander mentioned the dynamics. I think we
should
> > > have a lit of items ready that could influence optimization
> > > possibilities significanty if they were to be announced by the
user
> > > before he can use them. That would give another strong argument
for
> > the
> > > subsetting.
> > >
> > > Best,
> > >   Torsten
> > >
> > > --
> > >  bash$ :(){ :|:&};: --------------------- http://www.unixer.de/
> -----
> > > Indiana University    | http://www.indiana.edu
> > > Open Systems Lab      | http://osl.iu.edu/
> > > 150 S. Woodlawn Ave.  | Bloomington, IN, 474045-7104 | USA
> > > Lindley Hall Room 135 | +01 (812) 855-3608
> > > _______________________________________________
> > > Mpi3-subsetting mailing list
> > > Mpi3-subsetting_at_[hidden]
> > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
> > >
> ---------------------------------------------------------------------
> > > Intel GmbH
> > > Dornacher Strasse 1
> > > 85622 Feldkirchen/Muenchen Germany
> > > Sitz der Gesellschaft: Feldkirchen bei Muenchen
> > > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes
Schwaderer
> > > Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> > > VAT Registration No.: DE129385895
> > > Citibank Frankfurt (BLZ 502 109 00) 600119052
> > >
> > > This e-mail and any attachments may contain confidential material
> for
> > > the sole use of the intended recipient(s). Any review or
> distribution
> > > by others is strictly prohibited. If you are not the intended
> > > recipient, please contact the sender and delete all copies.
> > >
> > >
> > > _______________________________________________
> > > Mpi3-subsetting mailing list
> > > Mpi3-subsetting_at_[hidden]
> > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
> > >
> > _______________________________________________
> > Mpi3-subsetting mailing list
> > Mpi3-subsetting_at_[hidden]
> > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
> >
---------------------------------------------------------------------
> > Intel GmbH
> > Dornacher Strasse 1
> > 85622 Feldkirchen/Muenchen Germany
> > Sitz der Gesellschaft: Feldkirchen bei Muenchen
> > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> > Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> > VAT Registration No.: DE129385895
> > Citibank Frankfurt (BLZ 502 109 00) 600119052
> >
> > This e-mail and any attachments may contain confidential material
for
> > the sole use of the intended recipient(s). Any review or
distribution
> > by others is strictly prohibited. If you are not the intended
> > recipient, please contact the sender and delete all copies.
> >
> >
> ---------------------------------------------------------------------
> Intel GmbH
> Dornacher Strasse 1
> 85622 Feldkirchen/Muenchen Germany
> Sitz der Gesellschaft: Feldkirchen bei Muenchen
> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> VAT Registration No.: DE129385895
> Citibank Frankfurt (BLZ 502 109 00) 600119052
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
>
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.



More information about the Mpi3-subsetting mailing list