From alexander.supalov at [hidden]  Fri Feb 22 01:00:44 2008
From: alexander.supalov at [hidden] (Supalov, Alexander)
Date: Fri, 22 Feb 2008 07:00:44 -0000
Subject: [Mpi3-subsetting] (no subject)
Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A2011113FA@swsmsx413.ger.corp.intel.com>


--
Dr Alexander Supalov
Intel GmbH
Hermuelheimer Strasse 8a
50321 Bruehl, Germany
Phone:          +49 2232 209034
Mobile:          +49 173 511 8735
Fax:              +49 2232 209029
 
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


* 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpi3-subsetting/attachments/20080222/24e97211/attachment.html>

From alexander.supalov at [hidden]  Fri Feb 22 07:48:35 2008
From: alexander.supalov at [hidden] (Supalov, Alexander)
Date: Fri, 22 Feb 2008 13:48:35 -0000
Subject: [Mpi3-subsetting] subsetting kickoff telecon ww09
Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A2011117A2@swsmsx413.ger.corp.intel.com>


Hi everybody,
 
I suggest that we should have a 1-hour kickoff telecon to get going on
the MPI-3 subsetting. Please reply to me directly
(alexander.supalov_at_[hidden]) with an indication of the suitable time
out of the list below:
 
February 25, 2007    7:00 am PST/10:00 am EST/16:00 CET    yes/no/plan B
February 25, 2007    8:00 am PST/11:00 am EST/17:00 CET    yes/no/plan B
February 25, 2007    9:00 am PST/12:00 pm EST/18:00 CET    yes/no/plan B

February 27, 2007    9:00 am PST/12:00 pm EST/18:00 CET    yes/no/plan B
February 28, 2007    9:00 am PST/12:00 pm EST/18:00 CET    yes/no/plan B
February 29, 2007    7:00 am PST/10:00 am EST/16:00 CET    yes/no/plan B
 
I'll take care of the bridge and agenda once the time is settled.
 
Best regards.
 
Alexander
 

--
Dr Alexander Supalov
Intel GmbH
Hermuelheimer Strasse 8a
50321 Bruehl, Germany
Phone:          +49 2232 209034
Mobile:          +49 173 511 8735
Fax:              +49 2232 209029
 
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


* 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpi3-subsetting/attachments/20080222/eec37144/attachment.html>

From leonidm at [hidden]  Fri Feb 22 20:22:18 2008
From: leonidm at [hidden] (Leonid Meyerguz)
Date: Fri, 22 Feb 2008 18:22:18 -0800
Subject: [Mpi3-subsetting] subsetting kickoff telecon ww09
In-Reply-To: <5ECAB1304A8B5B4CB3F9D6C01E4E21A2011117A2@swsmsx413.ger.corp.intel.com>
Message-ID: <43AEA9A9F7768B42A89F554F0EBF7ED8273B6DA0E8@NA-EXMSG-C102.redmond.corp.microsoft.com>


Hi Alexander,

I vote Feb 27th 9:00 AM PST as my first choice, and Feb 28th 9:00 AM PST as plan B.

Regards,
Leonid.

From: mpi3-subsetting-bounces_at_[hidden] [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of Supalov, Alexander
Sent: Friday, February 22, 2008 5:49 AM
To: mpi3-subsetting_at_[hidden]
Subject: [Mpi3-subsetting] subsetting kickoff telecon ww09

Hi everybody,

I suggest that we should have a 1-hour kickoff telecon to get going on the MPI-3 subsetting. Please reply to me directly (alexander.supalov_at_[hidden]<mailto:alexander.supalov_at_[hidden]>) with an indication of the suitable time out of the list below:

February 25, 2007    7:00 am PST/10:00 am EST/16:00 CET    yes/no/plan B
February 25, 2007    8:00 am PST/11:00 am EST/17:00 CET    yes/no/plan B
February 25, 2007    9:00 am PST/12:00 pm EST/18:00 CET    yes/no/plan B
February 27, 2007    9:00 am PST/12:00 pm EST/18:00 CET    yes/no/plan B
February 28, 2007    9:00 am PST/12:00 pm EST/18:00 CET    yes/no/plan B
February 29, 2007    7:00 am PST/10:00 am EST/16:00 CET    yes/no/plan B

I'll take care of the bridge and agenda once the time is settled.

Best regards.

Alexander


--
Dr Alexander Supalov
Intel GmbH
Hermuelheimer Strasse 8a
50321 Bruehl, Germany
Phone:          +49 2232 209034
Mobile:          +49 173 511 8735
Fax:              +49 2232 209029
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


* 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpi3-subsetting/attachments/20080222/4d9cca8c/attachment.html>

From alexander.supalov at [hidden]  Sat Feb 23 09:27:07 2008
From: alexander.supalov at [hidden] (Supalov, Alexander)
Date: Sat, 23 Feb 2008 15:27:07 -0000
Subject: [Mpi3-subsetting] subsetting kickoff telecon ww09
In-Reply-To: <5ECAB1304A8B5B4CB3F9D6C01E4E21A2011117A2@swsmsx413.ger.corp.intel.com>
Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A20113C46D@swsmsx413.ger.corp.intel.com>


Hi everybody,
 
Thanks a lot for your replies. I've got 6 so far.
 
Wednesday, February 27, PST 9:00 am PST/12:00 pm EST/18:00 CET emerges
as the time when all but one of us can meet. Thursday, February 28, same
time is a firm plan B for all but 2 of us. Monday and Friday look
increasingly weak.
 
Due to this, I'll wait for more replies till Tuesday morning CET and
then announce the final time, connection details, and agenda of the
kickoff telecon.
 
Best regards.
 
Alexander

________________________________

From: mpi3-subsetting-bounces_at_[hidden]
[mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of
Supalov, Alexander
Sent: Friday, February 22, 2008 2:49 PM
To: mpi3-subsetting_at_[hidden]
Subject: [Mpi3-subsetting] subsetting kickoff telecon ww09

Hi everybody,
 
I suggest that we should have a 1-hour kickoff telecon to get going on
the MPI-3 subsetting. Please reply to me directly
(alexander.supalov_at_[hidden]) with an indication of the suitable time
out of the list below:
 
February 25, 2007    7:00 am PST/10:00 am EST/16:00 CET    yes/no/plan B
February 25, 2007    8:00 am PST/11:00 am EST/17:00 CET    yes/no/plan B
February 25, 2007    9:00 am PST/12:00 pm EST/18:00 CET    yes/no/plan B

February 27, 2007    9:00 am PST/12:00 pm EST/18:00 CET    yes/no/plan B
February 28, 2007    9:00 am PST/12:00 pm EST/18:00 CET    yes/no/plan B
February 29, 2007    7:00 am PST/10:00 am EST/16:00 CET    yes/no/plan B
 
I'll take care of the bridge and agenda once the time is settled.
 
Best regards.
 
Alexander
 

--
Dr Alexander Supalov
Intel GmbH
Hermuelheimer Strasse 8a
50321 Bruehl, Germany
Phone:          +49 2232 209034
Mobile:          +49 173 511 8735
Fax:              +49 2232 209029
 
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


* 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpi3-subsetting/attachments/20080223/e7f67f83/attachment.html>

From spoole at [hidden]  Sat Feb 23 09:40:18 2008
From: spoole at [hidden] (Stephen Poole)
Date: Sat, 23 Feb 2008 10:40:18 -0500
Subject: [Mpi3-subsetting] subsetting kickoff telecon ww09
In-Reply-To: <5ECAB1304A8B5B4CB3F9D6C01E4E21A20113C46D@swsmsx413.ger.corp.intel.com>
Message-ID: <0227DF42-2AEA-46CD-B82E-AD3087AD2C06@ornl.gov>


I can meet either Th or Fr. Wednesday would be OK, but not at that  
time. I will be in the air.

Steve...

On Feb 23, 2008, at 10:27 AM, Supalov, Alexander wrote:

> Hi everybody,
>
> Thanks a lot for your replies. I've got 6 so far.
>
> Wednesday, February 27, PST 9:00 am PST/12:00 pm EST/18:00 CET  
> emerges as the time when all but one of us can meet. Thursday,  
> February 28, same time is a firm plan B for all but 2 of us. Monday  
> and Friday look increasingly weak.
>
> Due to this, I'll wait for more replies till Tuesday morning CET  
> and then announce the final time, connection details, and agenda of  
> the kickoff telecon.
>
> Best regards.
>
> Alexander
>
> From: mpi3-subsetting-bounces_at_[hidden] [mailto:mpi3- 
> subsetting-bounces_at_[hidden]] On Behalf Of Supalov,  
> Alexander
> Sent: Friday, February 22, 2008 2:49 PM
> To: mpi3-subsetting_at_[hidden]
> Subject: [Mpi3-subsetting] subsetting kickoff telecon ww09
>
> Hi everybody,
>
> I suggest that we should have a 1-hour kickoff telecon to get going  
> on the MPI-3 subsetting. Please reply to me directly  
> (alexander.supalov_at_[hidden]) with an indication of the suitable  
> time out of the list below:
>
> February 25, 2007    7:00 am PST/10:00 am EST/16:00 CET    yes/no/ 
> plan B
> February 25, 2007    8:00 am PST/11:00 am EST/17:00 CET    yes/no/ 
> plan B
> February 25, 2007    9:00 am PST/12:00 pm EST/18:00 CET    yes/no/ 
> plan B
> February 27, 2007    9:00 am PST/12:00 pm EST/18:00 CET    yes/no/ 
> plan B
> February 28, 2007    9:00 am PST/12:00 pm EST/18:00 CET    yes/no/ 
> plan B
> February 29, 2007    7:00 am PST/10:00 am EST/16:00 CET    yes/no/ 
> plan B
>
> I'll take care of the bridge and agenda once the time is settled.
>
> Best regards.
>
> Alexander
>
> --
> Dr Alexander Supalov
> Intel GmbH
> Hermuelheimer Strasse 8a
> 50321 Bruehl, Germany
> Phone:          +49 2232 209034
> Mobile:          +49 173 511 8735
> Fax:              +49 2232 209029
>
> ---------------------------------------------------------------------
> Intel GmbH
> Dornacher Strasse 1
> 85622 Feldkirchen/Muenchen Germany
> Sitz der Gesellschaft: Feldkirchen bei Muenchen
> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> VAT Registration No.: DE129385895
> Citibank Frankfurt (BLZ 502 109 00) 600119052
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
> ---------------------------------------------------------------------
> Intel GmbH
> Dornacher Strasse 1
> 85622 Feldkirchen/Muenchen Germany
> Sitz der Gesellschaft: Feldkirchen bei Muenchen
> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> VAT Registration No.: DE129385895
> Citibank Frankfurt (BLZ 502 109 00) 600119052
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
> _______________________________________________
> Mpi3-subsetting mailing list
> Mpi3-subsetting_at_[hidden]
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting

==================================================>

Steve Poole
Computer Science and Mathematics Division
Chief Scientist / Director of Special Programs
Computational Sciences and Engineering Division
National Center for Computational Sciences Division
Oak Ridge National Laboratory
865.574.9008 (0ffice)

865.574.6076 (Fax)

"Wisdom is not a product of schooling, but of the lifelong attempt to  
acquire it" Albert Einstein

====================================================>


* 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpi3-subsetting/attachments/20080223/16cca10b/attachment.html>

From alexander.supalov at [hidden]  Tue Feb 26 02:25:04 2008
From: alexander.supalov at [hidden] (Supalov, Alexander)
Date: Tue, 26 Feb 2008 08:25:04 -0000
Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09
In-Reply-To: <5ECAB1304A8B5B4CB3F9D6C01E4E21A2011117A2@swsmsx413.ger.corp.intel.com>
Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A20113CCDA@swsmsx413.ger.corp.intel.com>


Hi everybody,
 
Let's meet on February 28, 2008, at 9:00 PST/12:00 pm EST/18:00 CET.
 
+1-916-356-2663, Bridge: 4, Passcode: 5661281

- Opens & introductions
- Scope of the effort
- Next steps
 
If you cannot make the time, please send your notes to this list prior
to the meeting.
 
Best regards.
 
Alexander
 
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


* 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpi3-subsetting/attachments/20080226/8c6ad2bc/attachment.html>

From alexander.supalov at [hidden]  Tue Feb 26 05:10:15 2008
From: alexander.supalov at [hidden] (Supalov, Alexander)
Date: Tue, 26 Feb 2008 11:10:15 -0000
Subject: [Mpi3-subsetting] Subsetting scope: a POV
Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A20113CED8@swsmsx413.ger.corp.intel.com>


Hi everybody,
 
In the run-up to our kick-off, here's my POV on the subsetting and its
possible scope/role in the MPI-3. Your comments and suggestions are most
welcome. 
 
We start this activity because:
 
1) Certain industrial customers complain about MPI complexity and
inadequacy
2) Complexity is going to grow in MPI-3
3) Growing complexity may have growing performance implications
 
As a result of the above, customers drift away from the MPI to
home-grown libraries, usually based on sockets. This effectively
eliminates fast networks from their scope, unless they can profit from
fast IP emulation layers. Moreover, this customer drift, if continued,
may make MPI irrelevant in some HPC areas and lead to creation of
alternative interfaces there.
 
The main purpose of the Forum, as well as the subsetting WG, is thus to
react to customer demand and make MPI faster and easier to use,
especially in those areas that are subjected to the increasing customer
drift (think, e.g., massive master-slave computations).
 
Basing on these premises, the subsetting, in my mind, should try to:
 
1) Make MPI standard modular. This may include:
    a) Splitting the standard functionality into coherent groups that
users will be able to select/deselect at init time
    b) Making implementation of some modules/functionality optional
(think dynamic process support) as they are anyway now
    c) Addressing not only functional groups but also certain aspects of
the standard that may not be needed in certain use cases (think
communicator management, message tagging, derived datatypes,
MPI_ANY_SOURCE support, non-blocking communication, etc.)
 
2) As part of the modularization, optionally identify the minimum
functional MPI subset. This may be:
    a) Those 6 calls (Init, Rank, Size, Send, Recv, Finalize), possibly
w/o communicator management and derived datatypes.
    b) A more flexible combination of modules actually needed by the
user
 
3) Connect subsetting with other MPI-3 activities (FT, ABI, collectives,
etc.)
 
As a result of modularization, we should strive to achieve
 
1) Simplification of the standard for the newcomers
2) Performance advantages for reasonable module combinations
3) Influence upon the overall shape of the MPI-3 standard
 
There are certainly quite a few concerns here:
 
1) We may end up complicating the standard and its implementation even
further
2) We may facilitate a split of the standard into several mutually
incompatible implementation "families"
3) We may cause some valid MPI-3 applications break if they use optional
modules not available in the implementation involved
4) We may get carried away by academic considerations and miss the
actual customer demands in the process
5) We may be engaging into a lost battle because the MPI standard is way
too rigid by design/purpose to be simplified
 
To guard against all this, we need to work closely with other WGs and
the Forum as a whole, define our goals as early as possible, and solicit
extensive Forum and customer feedback.
 
>From all this, by the time of the Forum meeting in March, we should have
at least a couple of slides reflecting our intentions and plans.
 
Best regards.
 
Alexander
 

--
Dr Alexander Supalov
Intel GmbH
Hermuelheimer Strasse 8a
50321 Bruehl, Germany
Phone:          +49 2232 209034
Mobile:          +49 173 511 8735
Fax:              +49 2232 209029
 
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


* 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpi3-subsetting/attachments/20080226/4b1a96c7/attachment.html>

From bronis at [hidden]  Tue Feb 26 06:54:10 2008
From: bronis at [hidden] (Bronis R. de Supinski)
Date: Tue, 26 Feb 2008 04:54:10 -0800 (PST)
Subject: [Mpi3-subsetting] Subsetting scope: a POV
In-Reply-To: <5ECAB1304A8B5B4CB3F9D6C01E4E21A20113CED8@swsmsx413.ger.corp.intel.com>
Message-ID: <Pine.LNX.4.58.0802260443250.30545@tux213.llnl.gov>


Alexander:

Re:
> We start this activity because:
>
> 1) Certain industrial customers complain about MPI complexity and
> inadequacy

While this is a legitimate concern, it cannot be allowed to
cause the standard to devolve into a fractured morass that
removes the portability of MPI programs that has been a key
aspect in the success of MPI.

> 2) Complexity is going to grow in MPI-3

True. And it is not clear that many of the proposed extensions
are required for portable programming. In fact, they could be
designed with the idea that they are optional in mind.

> 3) Growing complexity may have growing performance implications
>
> As a result of the above, customers drift away from the MPI to
> home-grown libraries, usually based on sockets. This effectively
> eliminates fast networks from their scope, unless they can profit from
> fast IP emulation layers. Moreover, this customer drift, if continued,
> may make MPI irrelevant in some HPC areas and lead to creation of
> alternative interfaces there.
>
> The main purpose of the Forum, as well as the subsetting WG, is thus to
> react to customer demand and make MPI faster and easier to use,
> especially in those areas that are subjected to the increasing customer
> drift (think, e.g., massive master-slave computations).
>
> Basing on these premises, the subsetting, in my mind, should try to:
>
> 1) Make MPI standard modular. This may include:
>     a) Splitting the standard functionality into coherent groups that
> users will be able to select/deselect at init time
>     b) Making implementation of some modules/functionality optional
> (think dynamic process support) as they are anyway now

The key issue will de defining what is optional. Clearly, dynamic
process support is a good candidate since it already effectively is.
However, most of the functions from MPI-1 are not (there may be some
concepts/features that can be -- perhaps wild cards and topologies)
but the main communication functions (particularly the full set of
collectives) are not; nor is the profiling interface (in fact one
could argue that the profiling interface could subsetted in corrspondence
to the user level subsetting; it's not clear anything else makes sense).

>     c) Addressing not only functional groups but also certain aspects of
> the standard that may not be needed in certain use cases (think
> communicator management, message tagging, derived datatypes,
> MPI_ANY_SOURCE support, non-blocking communication, etc.)
>
> 2) As part of the modularization, optionally identify the minimum
> functional MPI subset. This may be:
>     a) Those 6 calls (Init, Rank, Size, Send, Recv, Finalize), possibly
> w/o communicator management and derived datatypes.

This may be a reasonable set. One thing that needs to be stated
is that the minimal subset should not be the only non-optional one.
If you do that, then portability is lost.

>     b) A more flexible combination of modules actually needed by the
> user
>
> 3) Connect subsetting with other MPI-3 activities (FT, ABI, collectives,
> etc.)
>
> As a result of modularization, we should strive to achieve
>
> 1) Simplification of the standard for the newcomers
> 2) Performance advantages for reasonable module combinations
> 3) Influence upon the overall shape of the MPI-3 standard
>
> There are certainly quite a few concerns here:
>
> 1) We may end up complicating the standard and its implementation even
> further
> 2) We may facilitate a split of the standard into several mutually
> incompatible implementation "families"

Yes, this probably the most significant concern. You can
certainly subset the standard without making it more complicated
but the obvious way to do it easily results in this problem.

> 3) We may cause some valid MPI-3 applications break if they use optional
> modules not available in the implementation involved

A publish/subscribe query interface is clearly a minimal
part of what needs to be provided.

Bronis

> 4) We may get carried away by academic considerations and miss the
> actual customer demands in the process
> 5) We may be engaging into a lost battle because the MPI standard is way
> too rigid by design/purpose to be simplified
>
> To guard against all this, we need to work closely with other WGs and
> the Forum as a whole, define our goals as early as possible, and solicit
> extensive Forum and customer feedback.
>
> >From all this, by the time of the Forum meeting in March, we should have
> at least a couple of slides reflecting our intentions and plans.
>
> Best regards.
>
> Alexander
>
> --
> Dr Alexander Supalov
> Intel GmbH
> Hermuelheimer Strasse 8a
> 50321 Bruehl, Germany
> Phone:          +49 2232 209034
> Mobile:          +49 173 511 8735
> Fax:              +49 2232 209029
>
> ---------------------------------------------------------------------
> Intel GmbH
> Dornacher Strasse 1
> 85622 Feldkirchen/Muenchen Germany
> Sitz der Gesellschaft: Feldkirchen bei Muenchen
> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> VAT Registration No.: DE129385895
> Citibank Frankfurt (BLZ 502 109 00) 600119052
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>


From alexander.supalov at [hidden]  Thu Feb 28 12:04:32 2008
From: alexander.supalov at [hidden] (Supalov, Alexander)
Date: Thu, 28 Feb 2008 18:04:32 -0000
Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09
In-Reply-To: <5ECAB1304A8B5B4CB3F9D6C01E4E21A2011117A2@swsmsx413.ger.corp.intel.com>
Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A20116EFEB@swsmsx413.ger.corp.intel.com>


Hi  everybody,

Thank you for your time today. It was a very good discussion. Here's
what I captured (please add/modify what I may have missed):
 
Present: Leonid Meyerguz (Microsoft), Rich Graham (ORNL), Richard
Barrett (ORNL), Torsten Hoefler (ISU), Alexander Supalov (Intel)
 
- Opens & introductions 
 
- Scope of the effort 
  - Rich
    - Minimum subset consistent with the rest of MPI, for
performance/memory footprint optimization
    - Danger of splitting MPI, hence against optional features in the
standard
    - Both blocking & nonblocking belong to the core
  - Torsten
    - Some collectives may go into selectable subsets
    - MPI_ANY_SOURCE considered harmful
  - Leonid
    - Flexible support for optional features, means for choosing and
advertising level of compliance/set of features
  - See enclosed email for Alexander's POV
 
- General discussion snapshots
  - Support of subsets: some or all? If some, possible linkage problems
in static apps (or dead calls). If all, where's the gain?
  - Optional: really optional (may be not present) or selectable (are
present but may be unused)?
  - Performance penalty for unused subsets: implementation matter or
standard choice?
  - Portability may be limited to certain class of applications (think
FT, master-slave runs)
  - All we design needs to be implementable, complexity needs to be
controlled
  - An ability to use certain set of subsets should not preclude pulling
in other modules if necessary
  - Whatever we do, it should not conflict with the ABI efforts
  - Need to stay nice and be nicer wrt to the libraries (think
threading) and keep things simple
  - The simplification argument, if put first, may not be liked by some
 
- Next steps
  - Please comment on these minutes, and add/modify what I may have
missed
  - I'll prepare a couple of slides by next week summarizing our
discussion so far; again, your feedback will be most welcome
  - At the meeting, it may be great to meet F2F briefly and discuss any
eventual loose ends before the presentation at the Forum; I'll see to
this
 
Best regards.
 
Alexander
 

--
Dr Alexander Supalov
Intel GmbH
Hermuelheimer Strasse 8a
50321 Bruehl, Germany
Phone:          +49 2232 209034
Mobile:          +49 173 511 8735
Fax:              +49 2232 209029
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


<strong>attached mail follows:</strong>


Hi everybody,
 
In the run-up to our kick-off, here's my POV on the subsetting and its
possible scope/role in the MPI-3. Your comments and suggestions are most
welcome. 
 
We start this activity because:
 
1) Certain industrial customers complain about MPI complexity and
inadequacy
2) Complexity is going to grow in MPI-3
3) Growing complexity may have growing performance implications
 
As a result of the above, customers drift away from the MPI to
home-grown libraries, usually based on sockets. This effectively
eliminates fast networks from their scope, unless they can profit from
fast IP emulation layers. Moreover, this customer drift, if continued,
may make MPI irrelevant in some HPC areas and lead to creation of
alternative interfaces there.
 
The main purpose of the Forum, as well as the subsetting WG, is thus to
react to customer demand and make MPI faster and easier to use,
especially in those areas that are subjected to the increasing customer
drift (think, e.g., massive master-slave computations).
 
Basing on these premises, the subsetting, in my mind, should try to:
 
1) Make MPI standard modular. This may include:
    a) Splitting the standard functionality into coherent groups that
users will be able to select/deselect at init time
    b) Making implementation of some modules/functionality optional
(think dynamic process support) as they are anyway now
    c) Addressing not only functional groups but also certain aspects of
the standard that may not be needed in certain use cases (think
communicator management, message tagging, derived datatypes,
MPI_ANY_SOURCE support, non-blocking communication, etc.)
 
2) As part of the modularization, optionally identify the minimum
functional MPI subset. This may be:
    a) Those 6 calls (Init, Rank, Size, Send, Recv, Finalize), possibly
w/o communicator management and derived datatypes.
    b) A more flexible combination of modules actually needed by the
user
 
3) Connect subsetting with other MPI-3 activities (FT, ABI, collectives,
etc.)
 
As a result of modularization, we should strive to achieve
 
1) Simplification of the standard for the newcomers
2) Performance advantages for reasonable module combinations
3) Influence upon the overall shape of the MPI-3 standard
 
There are certainly quite a few concerns here:
 
1) We may end up complicating the standard and its implementation even
further
2) We may facilitate a split of the standard into several mutually
incompatible implementation "families"
3) We may cause some valid MPI-3 applications break if they use optional
modules not available in the implementation involved
4) We may get carried away by academic considerations and miss the
actual customer demands in the process
5) We may be engaging into a lost battle because the MPI standard is way
too rigid by design/purpose to be simplified
 
To guard against all this, we need to work closely with other WGs and
the Forum as a whole, define our goals as early as possible, and solicit
extensive Forum and customer feedback.
 
>From all this, by the time of the Forum meeting in March, we should have
at least a couple of slides reflecting our intentions and plans.
 
Best regards.
 
Alexander
 

--
Dr Alexander Supalov
Intel GmbH
Hermuelheimer Strasse 8a
50321 Bruehl, Germany
Phone:          +49 2232 209034
Mobile:          +49 173 511 8735
Fax:              +49 2232 209029
 

* 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpi3-subsetting/attachments/20080228/673bb604/attachment.html>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpi3-subsetting/attachments/20080228/673bb604/attachment-0001.html>

From htor at [hidden]  Thu Feb 28 22:07:51 2008
From: htor at [hidden] (Torsten Hoefler)
Date: Thu, 28 Feb 2008 23:07:51 -0500
Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09
In-Reply-To: <5ECAB1304A8B5B4CB3F9D6C01E4E21A20116EFEB@swsmsx413.ger.corp.intel.com>
Message-ID: <20080229040751.GB16623@benten.cs.indiana.edu>


Hi,
>    Present: Leonid Meyerguz (Microsoft), Rich Graham (ORNL), Richard Barrett
>    (ORNL), Torsten Hoefler (ISU), Alexander Supalov (Intel)
just for the record, it's "IU" not "ISU" :-)

>    - Scope of the effort
>      - Rich
>        - Minimum subset consistent with the rest of MPI, for
>    performance/memory footprint optimization
>        - Danger of splitting MPI, hence against optional features in the
>    standard
I back that (danger of optional features for portability). I'd propose
to split the current standard into mostly self-contained subsets that
have clearly defined interfaces to the rest of the standard. Note: this
only defines logical interfaces, that does *not* define how those things
are to be implemented. This makes it easier to understand the standard
and have separate (portable) libraries for the subsets, it does not
influence optimization possibilities by implementing everything in a
monolithic block (i.e., central progress). 

>        - Both blocking & nonblocking belong to the core
>      - Torsten
>        - Some collectives may go into selectable subsets
I see three subsets: blocking colls, non-blocking colls and topological
colls (maybe also blocking / non-blocking).

>        - MPI_ANY_SOURCE considered harmful
I'd like to add datatypes and heterogeneity to this list (with regards
to performance). Alexander mentioned the dynamics. I think we should
have a lit of items ready that could influence optimization
possibilities significanty if they were to be announced by the user
before he can use them. That would give another strong argument for the
subsetting.

Best,
  Torsten


-- 
 bash$ :(){ :|:&};: --------------------- http://www.unixer.de/ -----
Indiana University    | http://www.indiana.edu
Open Systems Lab      | http://osl.iu.edu/
150 S. Woodlawn Ave.  | Bloomington, IN, 474045-7104 | USA
Lindley Hall Room 135 | +01 (812) 855-3608


From alexander.supalov at [hidden]  Thu Feb 28 22:29:01 2008
From: alexander.supalov at [hidden] (Supalov, Alexander)
Date: Fri, 29 Feb 2008 04:29:01 -0000
Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09
In-Reply-To: <20080229040751.GB16623@benten.cs.indiana.edu>
Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A201196FE1@swsmsx413.ger.corp.intel.com>


Hi,

Thanks. What subsets inside the current standard would you propose? What
interfaces between them would you envision?

Good idea about the optimization opportunities. Here's an initial
combined list, with the main benefits as I see them. Please
comment/extend.

- Dynamic process support: less overhead in the progress engine, easier
global rank handling.
- Heterogeneity: better memory footprint, easier data handling.
- Derived datatypes (especially those with holes): better memory
footprint.
- MPI_ANY_SOURCE: faster, more simple multifabric progress.
- File I/O: smaller requests, easier wait/test functions.
- One-sided ops: no passive target w/o MPI calls - no extra progress
thread.
- Communicator & group management: better memory footprint.
- Message tagging: better support for stable dataflow exchanges, smaller
packets.
- Non-blocking communication: easier ordering, simplified request
handling.

Best regards.

Alexander 

-----Original Message-----
From: mpi3-subsetting-bounces_at_[hidden]
[mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of
Torsten Hoefler
Sent: Friday, February 29, 2008 5:08 AM
To: mpi3-subsetting_at_[hidden]
Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff telecon
ww09

Hi,
>    Present: Leonid Meyerguz (Microsoft), Rich Graham (ORNL), Richard
Barrett
>    (ORNL), Torsten Hoefler (ISU), Alexander Supalov (Intel)
just for the record, it's "IU" not "ISU" :-)

>    - Scope of the effort
>      - Rich
>        - Minimum subset consistent with the rest of MPI, for
>    performance/memory footprint optimization
>        - Danger of splitting MPI, hence against optional features in
the
>    standard
I back that (danger of optional features for portability). I'd propose
to split the current standard into mostly self-contained subsets that
have clearly defined interfaces to the rest of the standard. Note: this
only defines logical interfaces, that does *not* define how those things
are to be implemented. This makes it easier to understand the standard
and have separate (portable) libraries for the subsets, it does not
influence optimization possibilities by implementing everything in a
monolithic block (i.e., central progress). 

>        - Both blocking & nonblocking belong to the core
>      - Torsten
>        - Some collectives may go into selectable subsets
I see three subsets: blocking colls, non-blocking colls and topological
colls (maybe also blocking / non-blocking).

>        - MPI_ANY_SOURCE considered harmful
I'd like to add datatypes and heterogeneity to this list (with regards
to performance). Alexander mentioned the dynamics. I think we should
have a lit of items ready that could influence optimization
possibilities significanty if they were to be announced by the user
before he can use them. That would give another strong argument for the
subsetting.

Best,
  Torsten


-- 
 bash$ :(){ :|:&};: --------------------- http://www.unixer.de/ -----
Indiana University    | http://www.indiana.edu
Open Systems Lab      | http://osl.iu.edu/
150 S. Woodlawn Ave.  | Bloomington, IN, 474045-7104 | USA
Lindley Hall Room 135 | +01 (812) 855-3608
_______________________________________________
Mpi3-subsetting mailing list
Mpi3-subsetting_at_[hidden]
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


From htor at [hidden]  Thu Feb 28 22:44:17 2008
From: htor at [hidden] (Torsten Hoefler)
Date: Thu, 28 Feb 2008 23:44:17 -0500
Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09
In-Reply-To: <5ECAB1304A8B5B4CB3F9D6C01E4E21A201196FE1@swsmsx413.ger.corp.intel.com>
Message-ID: <20080229044417.GI16623@benten.cs.indiana.edu>


Hi Alexander,
> Thanks. What subsets inside the current standard would you propose? 
> What interfaces between them would you envision?
that is a long discussion, I guess. So just to put something up for
discussion:

One subset could be collective communication and it would use Send/Recv
from the MPI-core interface. Same for non-blockong colls (using
nonblocking send/recv). Again, this is a logical design, it enables us to
easily implement a portable library that only uses this interface and
offers the standardized features. This library can be imported by
vendors who do not want to optimize the substet that is supported by the
lib. However, the MPI implementor is free to ignore the interface and do
the collectives inside the library in a monolithic way (for
performance). Other subsets could be:
- topology functions
- language bindings (certainly needs discussion)
- data-type handling
- groups/communicator handling (interface definition would be complex)
- profiling interface (similar to language bindings)
- parallel I/O
- process management
- one-sided (if this is not in core)
- grequests

> Good idea about the optimization opportunities. Here's an initial
> combined list, with the main benefits as I see them. Please
> comment/extend.
> 
> - Dynamic process support: less overhead in the progress engine, easier
> global rank handling.
ack

> - Heterogeneity: better memory footprint, easier data handling.
easier equals faster in this case

> - Derived datatypes (especially those with holes): better memory
> footprint.
hmm, I don't get the memory footprint argument? But I'd say that it
simplifies the critical path (one if less) and many applications just
don't need datatypes. This is necessary if we want to broaden our scope
(cf. the sockets interface has no datatypes and works well)

> - MPI_ANY_SOURCE: faster, more simple multifabric progress.
ack + receiver-based protocols (I wrote about this in "Optimizing
non-blocking Collective Operations for InfiniBand" will be presented at
the CAC workshop at IPDPS'07.

> - File I/O: smaller requests, easier wait/test functions.
yes

> - One-sided ops: no passive target w/o MPI calls - no extra progress
> thread.
> - Communicator & group management: better memory footprint.
> - Message tagging: better support for stable dataflow exchanges, smaller
> packets.
ack 

> - Non-blocking communication: easier ordering, simplified request
> handling.
I am not sure about this since only the local matching differs
(slightly) here, i.e., packets match a waiting recv (potentially dozens
of them in different threads) vs. packets match a non-blocking request.
This is pretty much the same overhead. How does that influence MPIs
ordering constraints?

Best,
  Torsten


-- 
 bash$ :(){ :|:&};: --------------------- http://www.unixer.de/ -----
Indiana University    | http://www.indiana.edu
Open Systems Lab      | http://osl.iu.edu/
150 S. Woodlawn Ave.  | Bloomington, IN, 474045-7104 | USA
Lindley Hall Room 135 | +01 (812) 855-3608


From bronis at [hidden]  Thu Feb 28 22:53:24 2008
From: bronis at [hidden] (Bronis R. de Supinski)
Date: Thu, 28 Feb 2008 20:53:24 -0800 (PST)
Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09
In-Reply-To: <5ECAB1304A8B5B4CB3F9D6C01E4E21A201196FE1@swsmsx413.ger.corp.intel.com>
Message-ID: <Pine.LNX.4.58.0802282042110.3046@tux213.llnl.gov>


All:

OK, I have to respond to the notion that derived datatypes
limit performance. It is just not a reasonable position.

Sure, if you can send contiguous locations, you will get
higher performance. The problem is that codes do not only
need to send contiguous data so that is not an adequate
reason to say derived datatypes limit performance.

So, what is left? That there is some more efficient way
to send non-contiguous data? How? As multiple messages,
each of which send contiguous data? If so, then the
implementation could do this under the covers and the
datatypes are just a convenience for the user not to
have to specify the individual sends. OK, suppose that's
not the reason. Perhaps the user can do the copying into
a contiguous buffer and get better performance? While
I have seen this hold with some implementations, it is
absurd. There is no reason that I can discern as to why
the user should be able to deduce a better copying
mechanism than the MPI implementer. So, again, at worst,
the datatypes should be a convenience. Do you have an
alternative reason or a refutation of these opinions?

What is more important, it is certainly possible to build
scatter/gather support into a NIC and achieve better
performance with datatypes than without. While there are
issues to be resolved for that (primarily the issue of
pinning memory), they are solvable with the right hardware
mechanism. Just because it does not yet exist is not
an adequate reason to say "Get rid of datatypes". OK,
you are not saying that but you are saying to deprecate
them in a sense. And saying you could send contiguous
sends more efficiently is a bad argument here. How do
datatypes cause inefficiency for that? How much is
that cost really? At what point do you hit where the
answer is "It would be faster not to compute anything"?

Bronis

On Fri, 29 Feb 2008, Supalov, Alexander wrote:

> Hi,
>
> Thanks. What subsets inside the current standard would you propose? What
> interfaces between them would you envision?
>
> Good idea about the optimization opportunities. Here's an initial
> combined list, with the main benefits as I see them. Please
> comment/extend.
>
> - Dynamic process support: less overhead in the progress engine, easier
> global rank handling.
> - Heterogeneity: better memory footprint, easier data handling.
> - Derived datatypes (especially those with holes): better memory
> footprint.
> - MPI_ANY_SOURCE: faster, more simple multifabric progress.
> - File I/O: smaller requests, easier wait/test functions.
> - One-sided ops: no passive target w/o MPI calls - no extra progress
> thread.
> - Communicator & group management: better memory footprint.
> - Message tagging: better support for stable dataflow exchanges, smaller
> packets.
> - Non-blocking communication: easier ordering, simplified request
> handling.
>
> Best regards.
>
> Alexander
>
> -----Original Message-----
> From: mpi3-subsetting-bounces_at_[hidden]
> [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of
> Torsten Hoefler
> Sent: Friday, February 29, 2008 5:08 AM
> To: mpi3-subsetting_at_[hidden]
> Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff telecon
> ww09
>
> Hi,
> >    Present: Leonid Meyerguz (Microsoft), Rich Graham (ORNL), Richard
> Barrett
> >    (ORNL), Torsten Hoefler (ISU), Alexander Supalov (Intel)
> just for the record, it's "IU" not "ISU" :-)
>
> >    - Scope of the effort
> >      - Rich
> >        - Minimum subset consistent with the rest of MPI, for
> >    performance/memory footprint optimization
> >        - Danger of splitting MPI, hence against optional features in
> the
> >    standard
> I back that (danger of optional features for portability). I'd propose
> to split the current standard into mostly self-contained subsets that
> have clearly defined interfaces to the rest of the standard. Note: this
> only defines logical interfaces, that does *not* define how those things
> are to be implemented. This makes it easier to understand the standard
> and have separate (portable) libraries for the subsets, it does not
> influence optimization possibilities by implementing everything in a
> monolithic block (i.e., central progress).
>
> >        - Both blocking & nonblocking belong to the core
> >      - Torsten
> >        - Some collectives may go into selectable subsets
> I see three subsets: blocking colls, non-blocking colls and topological
> colls (maybe also blocking / non-blocking).
>
> >        - MPI_ANY_SOURCE considered harmful
> I'd like to add datatypes and heterogeneity to this list (with regards
> to performance). Alexander mentioned the dynamics. I think we should
> have a lit of items ready that could influence optimization
> possibilities significanty if they were to be announced by the user
> before he can use them. That would give another strong argument for the
> subsetting.
>
> Best,
>   Torsten
>
> --
>  bash$ :(){ :|:&};: --------------------- http://www.unixer.de/ -----
> Indiana University    | http://www.indiana.edu
> Open Systems Lab      | http://osl.iu.edu/
> 150 S. Woodlawn Ave.  | Bloomington, IN, 474045-7104 | USA
> Lindley Hall Room 135 | +01 (812) 855-3608
> _______________________________________________
> Mpi3-subsetting mailing list
> Mpi3-subsetting_at_[hidden]
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
> ---------------------------------------------------------------------
> Intel GmbH
> Dornacher Strasse 1
> 85622 Feldkirchen/Muenchen Germany
> Sitz der Gesellschaft: Feldkirchen bei Muenchen
> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> VAT Registration No.: DE129385895
> Citibank Frankfurt (BLZ 502 109 00) 600119052
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
>
> _______________________________________________
> Mpi3-subsetting mailing list
> Mpi3-subsetting_at_[hidden]
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
>


From alexander.supalov at [hidden]  Thu Feb 28 22:58:13 2008
From: alexander.supalov at [hidden] (Supalov, Alexander)
Date: Fri, 29 Feb 2008 04:58:13 -0000
Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09
In-Reply-To: <20080229044417.GI16623@benten.cs.indiana.edu>
Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A201196FE2@swsmsx413.ger.corp.intel.com>


Hi,

Thanks. As soon as there's a couple of non-blocking recvs out there,
waiting for them in reverse order requires tracking of the moment when
the receives were posted. In some cases this leads to extra fields and
data exchanges.

The footprint argument generally says that the library will be smaller.
This may be a minor matter for general purpose computers, but as soon as
you go to Petascale, you need every byte on the compute nodes for user
data, especially if dynamic libraries are not supported.

As for the collectives, many are implemented using SendRecv, and that
blocking call in turn often uses non-blocking communication. Classic
Alltoallv algorithm uses nonblocking calls, too. So, I'm not sure that
even unoptimized blocking collectives will always use only blocking
pt2pt.

Best regards.

Alexander 

-----Original Message-----
From: mpi3-subsetting-bounces_at_[hidden]
[mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of
Torsten Hoefler
Sent: Friday, February 29, 2008 5:44 AM
To: mpi3-subsetting_at_[hidden]
Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff telecon
ww09

Hi Alexander,
> Thanks. What subsets inside the current standard would you propose? 
> What interfaces between them would you envision?
that is a long discussion, I guess. So just to put something up for
discussion:

One subset could be collective communication and it would use Send/Recv
from the MPI-core interface. Same for non-blockong colls (using
nonblocking send/recv). Again, this is a logical design, it enables us
to
easily implement a portable library that only uses this interface and
offers the standardized features. This library can be imported by
vendors who do not want to optimize the substet that is supported by the
lib. However, the MPI implementor is free to ignore the interface and do
the collectives inside the library in a monolithic way (for
performance). Other subsets could be:
- topology functions
- language bindings (certainly needs discussion)
- data-type handling
- groups/communicator handling (interface definition would be complex)
- profiling interface (similar to language bindings)
- parallel I/O
- process management
- one-sided (if this is not in core)
- grequests

> Good idea about the optimization opportunities. Here's an initial
> combined list, with the main benefits as I see them. Please
> comment/extend.
> 
> - Dynamic process support: less overhead in the progress engine,
easier
> global rank handling.
ack

> - Heterogeneity: better memory footprint, easier data handling.
easier equals faster in this case

> - Derived datatypes (especially those with holes): better memory
> footprint.
hmm, I don't get the memory footprint argument? But I'd say that it
simplifies the critical path (one if less) and many applications just
don't need datatypes. This is necessary if we want to broaden our scope
(cf. the sockets interface has no datatypes and works well)

> - MPI_ANY_SOURCE: faster, more simple multifabric progress.
ack + receiver-based protocols (I wrote about this in "Optimizing
non-blocking Collective Operations for InfiniBand" will be presented at
the CAC workshop at IPDPS'07.

> - File I/O: smaller requests, easier wait/test functions.
yes

> - One-sided ops: no passive target w/o MPI calls - no extra progress
> thread.
> - Communicator & group management: better memory footprint.
> - Message tagging: better support for stable dataflow exchanges,
smaller
> packets.
ack 

> - Non-blocking communication: easier ordering, simplified request
> handling.
I am not sure about this since only the local matching differs
(slightly) here, i.e., packets match a waiting recv (potentially dozens
of them in different threads) vs. packets match a non-blocking request.
This is pretty much the same overhead. How does that influence MPIs
ordering constraints?

Best,
  Torsten


-- 
 bash$ :(){ :|:&};: --------------------- http://www.unixer.de/ -----
Indiana University    | http://www.indiana.edu
Open Systems Lab      | http://osl.iu.edu/
150 S. Woodlawn Ave.  | Bloomington, IN, 474045-7104 | USA
Lindley Hall Room 135 | +01 (812) 855-3608
_______________________________________________
Mpi3-subsetting mailing list
Mpi3-subsetting_at_[hidden]
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


From alexander.supalov at [hidden]  Thu Feb 28 23:10:39 2008
From: alexander.supalov at [hidden] (Supalov, Alexander)
Date: Fri, 29 Feb 2008 05:10:39 -0000
Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09
In-Reply-To: <Pine.LNX.4.58.0802282042110.3046@tux213.llnl.gov>
Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A201196FE3@swsmsx413.ger.corp.intel.com>


Hi,

Thanks. I think the main thrust here is the library footprint (no
pack/unpack, etc.) and complexity of the user side of the datatype
interface, rather than performance. Many applications just don't need
any of this, and never will. Why not translating this application
non-requirement into a minimum MPI subset? Same with communicator/group
management, etc.

Moreover, homogeneous installations that dominate HPC now don't actually
need any datatype support at all. They send chunks of bytes. This may
change in the future, though.

A minor performance implication is that without holes that are only
possible with derived datatypes, one does not need to track this, split
the critical path, and make special provisions inside the MPI device
layer to handle iov or such.

The NIC capability argument is interesting, but it turns the discussion
on its head: we're not after motivating network vendors to provide
scatter/gather in hardware here, are we? Please clarify.

Best regards.

Alexander 

-----Original Message-----
From: mpi3-subsetting-bounces_at_[hidden]
[mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of Bronis
R. de Supinski
Sent: Friday, February 29, 2008 5:53 AM
To: mpi3-subsetting_at_[hidden]
Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff telecon
ww09

All:

OK, I have to respond to the notion that derived datatypes
limit performance. It is just not a reasonable position.

Sure, if you can send contiguous locations, you will get
higher performance. The problem is that codes do not only
need to send contiguous data so that is not an adequate
reason to say derived datatypes limit performance.

So, what is left? That there is some more efficient way
to send non-contiguous data? How? As multiple messages,
each of which send contiguous data? If so, then the
implementation could do this under the covers and the
datatypes are just a convenience for the user not to
have to specify the individual sends. OK, suppose that's
not the reason. Perhaps the user can do the copying into
a contiguous buffer and get better performance? While
I have seen this hold with some implementations, it is
absurd. There is no reason that I can discern as to why
the user should be able to deduce a better copying
mechanism than the MPI implementer. So, again, at worst,
the datatypes should be a convenience. Do you have an
alternative reason or a refutation of these opinions?

What is more important, it is certainly possible to build
scatter/gather support into a NIC and achieve better
performance with datatypes than without. While there are
issues to be resolved for that (primarily the issue of
pinning memory), they are solvable with the right hardware
mechanism. Just because it does not yet exist is not
an adequate reason to say "Get rid of datatypes". OK,
you are not saying that but you are saying to deprecate
them in a sense. And saying you could send contiguous
sends more efficiently is a bad argument here. How do
datatypes cause inefficiency for that? How much is
that cost really? At what point do you hit where the
answer is "It would be faster not to compute anything"?

Bronis

On Fri, 29 Feb 2008, Supalov, Alexander wrote:

> Hi,
>
> Thanks. What subsets inside the current standard would you propose?
What
> interfaces between them would you envision?
>
> Good idea about the optimization opportunities. Here's an initial
> combined list, with the main benefits as I see them. Please
> comment/extend.
>
> - Dynamic process support: less overhead in the progress engine,
easier
> global rank handling.
> - Heterogeneity: better memory footprint, easier data handling.
> - Derived datatypes (especially those with holes): better memory
> footprint.
> - MPI_ANY_SOURCE: faster, more simple multifabric progress.
> - File I/O: smaller requests, easier wait/test functions.
> - One-sided ops: no passive target w/o MPI calls - no extra progress
> thread.
> - Communicator & group management: better memory footprint.
> - Message tagging: better support for stable dataflow exchanges,
smaller
> packets.
> - Non-blocking communication: easier ordering, simplified request
> handling.
>
> Best regards.
>
> Alexander
>
> -----Original Message-----
> From: mpi3-subsetting-bounces_at_[hidden]
> [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of
> Torsten Hoefler
> Sent: Friday, February 29, 2008 5:08 AM
> To: mpi3-subsetting_at_[hidden]
> Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff telecon
> ww09
>
> Hi,
> >    Present: Leonid Meyerguz (Microsoft), Rich Graham (ORNL), Richard
> Barrett
> >    (ORNL), Torsten Hoefler (ISU), Alexander Supalov (Intel)
> just for the record, it's "IU" not "ISU" :-)
>
> >    - Scope of the effort
> >      - Rich
> >        - Minimum subset consistent with the rest of MPI, for
> >    performance/memory footprint optimization
> >        - Danger of splitting MPI, hence against optional features in
> the
> >    standard
> I back that (danger of optional features for portability). I'd propose
> to split the current standard into mostly self-contained subsets that
> have clearly defined interfaces to the rest of the standard. Note:
this
> only defines logical interfaces, that does *not* define how those
things
> are to be implemented. This makes it easier to understand the standard
> and have separate (portable) libraries for the subsets, it does not
> influence optimization possibilities by implementing everything in a
> monolithic block (i.e., central progress).
>
> >        - Both blocking & nonblocking belong to the core
> >      - Torsten
> >        - Some collectives may go into selectable subsets
> I see three subsets: blocking colls, non-blocking colls and
topological
> colls (maybe also blocking / non-blocking).
>
> >        - MPI_ANY_SOURCE considered harmful
> I'd like to add datatypes and heterogeneity to this list (with regards
> to performance). Alexander mentioned the dynamics. I think we should
> have a lit of items ready that could influence optimization
> possibilities significanty if they were to be announced by the user
> before he can use them. That would give another strong argument for
the
> subsetting.
>
> Best,
>   Torsten
>
> --
>  bash$ :(){ :|:&};: --------------------- http://www.unixer.de/ -----
> Indiana University    | http://www.indiana.edu
> Open Systems Lab      | http://osl.iu.edu/
> 150 S. Woodlawn Ave.  | Bloomington, IN, 474045-7104 | USA
> Lindley Hall Room 135 | +01 (812) 855-3608
> _______________________________________________
> Mpi3-subsetting mailing list
> Mpi3-subsetting_at_[hidden]
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
> ---------------------------------------------------------------------
> Intel GmbH
> Dornacher Strasse 1
> 85622 Feldkirchen/Muenchen Germany
> Sitz der Gesellschaft: Feldkirchen bei Muenchen
> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> VAT Registration No.: DE129385895
> Citibank Frankfurt (BLZ 502 109 00) 600119052
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
>
> _______________________________________________
> Mpi3-subsetting mailing list
> Mpi3-subsetting_at_[hidden]
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
>
_______________________________________________
Mpi3-subsetting mailing list
Mpi3-subsetting_at_[hidden]
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


From alexander.supalov at [hidden]  Thu Feb 28 23:17:13 2008
From: alexander.supalov at [hidden] (Supalov, Alexander)
Date: Fri, 29 Feb 2008 05:17:13 -0000
Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09
In-Reply-To: <Pine.LNX.4.58.0802282042110.3046@tux213.llnl.gov>
Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A201196FE4@swsmsx413.ger.corp.intel.com>


Woops... Chunks of bytes may have holes. Discard that argument.
Homogeneous installations don't need data transformation, but this is a
different matter.

-----Original Message-----
From: Supalov, Alexander 
Sent: Friday, February 29, 2008 6:11 AM
To: 'Bronis R. de Supinski'; 'mpi3-subsetting_at_[hidden]'
Subject: RE: [Mpi3-subsetting] agenda for subsetting kickoff telecon
ww09

Hi,

Thanks. I think the main thrust here is the library footprint (no
pack/unpack, etc.) and complexity of the user side of the datatype
interface, rather than performance. Many applications just don't need
any of this, and never will. Why not translating this application
non-requirement into a minimum MPI subset? Same with communicator/group
management, etc.

Moreover, homogeneous installations that dominate HPC now don't actually
need any datatype support at all. They send chunks of bytes. This may
change in the future, though.

A minor performance implication is that without holes that are only
possible with derived datatypes, one does not need to track this, split
the critical path, and make special provisions inside the MPI device
layer to handle iov or such.

The NIC capability argument is interesting, but it turns the discussion
on its head: we're not after motivating network vendors to provide
scatter/gather in hardware here, are we? Please clarify.

Best regards.

Alexander 

-----Original Message-----
From: mpi3-subsetting-bounces_at_[hidden]
[mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of Bronis
R. de Supinski
Sent: Friday, February 29, 2008 5:53 AM
To: mpi3-subsetting_at_[hidden]
Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff telecon
ww09

All:

OK, I have to respond to the notion that derived datatypes
limit performance. It is just not a reasonable position.

Sure, if you can send contiguous locations, you will get
higher performance. The problem is that codes do not only
need to send contiguous data so that is not an adequate
reason to say derived datatypes limit performance.

So, what is left? That there is some more efficient way
to send non-contiguous data? How? As multiple messages,
each of which send contiguous data? If so, then the
implementation could do this under the covers and the
datatypes are just a convenience for the user not to
have to specify the individual sends. OK, suppose that's
not the reason. Perhaps the user can do the copying into
a contiguous buffer and get better performance? While
I have seen this hold with some implementations, it is
absurd. There is no reason that I can discern as to why
the user should be able to deduce a better copying
mechanism than the MPI implementer. So, again, at worst,
the datatypes should be a convenience. Do you have an
alternative reason or a refutation of these opinions?

What is more important, it is certainly possible to build
scatter/gather support into a NIC and achieve better
performance with datatypes than without. While there are
issues to be resolved for that (primarily the issue of
pinning memory), they are solvable with the right hardware
mechanism. Just because it does not yet exist is not
an adequate reason to say "Get rid of datatypes". OK,
you are not saying that but you are saying to deprecate
them in a sense. And saying you could send contiguous
sends more efficiently is a bad argument here. How do
datatypes cause inefficiency for that? How much is
that cost really? At what point do you hit where the
answer is "It would be faster not to compute anything"?

Bronis

On Fri, 29 Feb 2008, Supalov, Alexander wrote:

> Hi,
>
> Thanks. What subsets inside the current standard would you propose?
What
> interfaces between them would you envision?
>
> Good idea about the optimization opportunities. Here's an initial
> combined list, with the main benefits as I see them. Please
> comment/extend.
>
> - Dynamic process support: less overhead in the progress engine,
easier
> global rank handling.
> - Heterogeneity: better memory footprint, easier data handling.
> - Derived datatypes (especially those with holes): better memory
> footprint.
> - MPI_ANY_SOURCE: faster, more simple multifabric progress.
> - File I/O: smaller requests, easier wait/test functions.
> - One-sided ops: no passive target w/o MPI calls - no extra progress
> thread.
> - Communicator & group management: better memory footprint.
> - Message tagging: better support for stable dataflow exchanges,
smaller
> packets.
> - Non-blocking communication: easier ordering, simplified request
> handling.
>
> Best regards.
>
> Alexander
>
> -----Original Message-----
> From: mpi3-subsetting-bounces_at_[hidden]
> [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of
> Torsten Hoefler
> Sent: Friday, February 29, 2008 5:08 AM
> To: mpi3-subsetting_at_[hidden]
> Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff telecon
> ww09
>
> Hi,
> >    Present: Leonid Meyerguz (Microsoft), Rich Graham (ORNL), Richard
> Barrett
> >    (ORNL), Torsten Hoefler (ISU), Alexander Supalov (Intel)
> just for the record, it's "IU" not "ISU" :-)
>
> >    - Scope of the effort
> >      - Rich
> >        - Minimum subset consistent with the rest of MPI, for
> >    performance/memory footprint optimization
> >        - Danger of splitting MPI, hence against optional features in
> the
> >    standard
> I back that (danger of optional features for portability). I'd propose
> to split the current standard into mostly self-contained subsets that
> have clearly defined interfaces to the rest of the standard. Note:
this
> only defines logical interfaces, that does *not* define how those
things
> are to be implemented. This makes it easier to understand the standard
> and have separate (portable) libraries for the subsets, it does not
> influence optimization possibilities by implementing everything in a
> monolithic block (i.e., central progress).
>
> >        - Both blocking & nonblocking belong to the core
> >      - Torsten
> >        - Some collectives may go into selectable subsets
> I see three subsets: blocking colls, non-blocking colls and
topological
> colls (maybe also blocking / non-blocking).
>
> >        - MPI_ANY_SOURCE considered harmful
> I'd like to add datatypes and heterogeneity to this list (with regards
> to performance). Alexander mentioned the dynamics. I think we should
> have a lit of items ready that could influence optimization
> possibilities significanty if they were to be announced by the user
> before he can use them. That would give another strong argument for
the
> subsetting.
>
> Best,
>   Torsten
>
> --
>  bash$ :(){ :|:&};: --------------------- http://www.unixer.de/ -----
> Indiana University    | http://www.indiana.edu
> Open Systems Lab      | http://osl.iu.edu/
> 150 S. Woodlawn Ave.  | Bloomington, IN, 474045-7104 | USA
> Lindley Hall Room 135 | +01 (812) 855-3608
> _______________________________________________
> Mpi3-subsetting mailing list
> Mpi3-subsetting_at_[hidden]
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
> ---------------------------------------------------------------------
> Intel GmbH
> Dornacher Strasse 1
> 85622 Feldkirchen/Muenchen Germany
> Sitz der Gesellschaft: Feldkirchen bei Muenchen
> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> VAT Registration No.: DE129385895
> Citibank Frankfurt (BLZ 502 109 00) 600119052
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
>
> _______________________________________________
> Mpi3-subsetting mailing list
> Mpi3-subsetting_at_[hidden]
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
>
_______________________________________________
Mpi3-subsetting mailing list
Mpi3-subsetting_at_[hidden]
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


From bronis at [hidden]  Thu Feb 28 23:19:43 2008
From: bronis at [hidden] (Bronis R. de Supinski)
Date: Thu, 28 Feb 2008 21:19:43 -0800 (PST)
Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09
In-Reply-To: <5ECAB1304A8B5B4CB3F9D6C01E4E21A201196FE3@swsmsx413.ger.corp.intel.com>
Message-ID: <Pine.LNX.4.58.0802282116160.3046@tux213.llnl.gov>


Alexander:

Most real applications need to send non-contiguous
data. If they do not use datatypes then they are
doing the equivalent of either the packing/unpacking
or smaller messages at the user level. This s hould
be discouraged, not encouraged. A small savings
in library object size is not ample reason to go
against that. And, yes, we are after encouraging
hardware vendors to provide the right hardware.

Bronis

On Fri, 29 Feb 2008, Supalov, Alexander wrote:

> Hi,
>
> Thanks. I think the main thrust here is the library footprint (no
> pack/unpack, etc.) and complexity of the user side of the datatype
> interface, rather than performance. Many applications just don't need
> any of this, and never will. Why not translating this application
> non-requirement into a minimum MPI subset? Same with communicator/group
> management, etc.
>
> Moreover, homogeneous installations that dominate HPC now don't actually
> need any datatype support at all. They send chunks of bytes. This may
> change in the future, though.
>
> A minor performance implication is that without holes that are only
> possible with derived datatypes, one does not need to track this, split
> the critical path, and make special provisions inside the MPI device
> layer to handle iov or such.
>
> The NIC capability argument is interesting, but it turns the discussion
> on its head: we're not after motivating network vendors to provide
> scatter/gather in hardware here, are we? Please clarify.
>
> Best regards.
>
> Alexander
>
> -----Original Message-----
> From: mpi3-subsetting-bounces_at_[hidden]
> [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of Bronis
> R. de Supinski
> Sent: Friday, February 29, 2008 5:53 AM
> To: mpi3-subsetting_at_[hidden]
> Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff telecon
> ww09
>
>
> All:
>
> OK, I have to respond to the notion that derived datatypes
> limit performance. It is just not a reasonable position.
>
> Sure, if you can send contiguous locations, you will get
> higher performance. The problem is that codes do not only
> need to send contiguous data so that is not an adequate
> reason to say derived datatypes limit performance.
>
> So, what is left? That there is some more efficient way
> to send non-contiguous data? How? As multiple messages,
> each of which send contiguous data? If so, then the
> implementation could do this under the covers and the
> datatypes are just a convenience for the user not to
> have to specify the individual sends. OK, suppose that's
> not the reason. Perhaps the user can do the copying into
> a contiguous buffer and get better performance? While
> I have seen this hold with some implementations, it is
> absurd. There is no reason that I can discern as to why
> the user should be able to deduce a better copying
> mechanism than the MPI implementer. So, again, at worst,
> the datatypes should be a convenience. Do you have an
> alternative reason or a refutation of these opinions?
>
> What is more important, it is certainly possible to build
> scatter/gather support into a NIC and achieve better
> performance with datatypes than without. While there are
> issues to be resolved for that (primarily the issue of
> pinning memory), they are solvable with the right hardware
> mechanism. Just because it does not yet exist is not
> an adequate reason to say "Get rid of datatypes". OK,
> you are not saying that but you are saying to deprecate
> them in a sense. And saying you could send contiguous
> sends more efficiently is a bad argument here. How do
> datatypes cause inefficiency for that? How much is
> that cost really? At what point do you hit where the
> answer is "It would be faster not to compute anything"?
>
> Bronis
>
>
> On Fri, 29 Feb 2008, Supalov, Alexander wrote:
>
> > Hi,
> >
> > Thanks. What subsets inside the current standard would you propose?
> What
> > interfaces between them would you envision?
> >
> > Good idea about the optimization opportunities. Here's an initial
> > combined list, with the main benefits as I see them. Please
> > comment/extend.
> >
> > - Dynamic process support: less overhead in the progress engine,
> easier
> > global rank handling.
> > - Heterogeneity: better memory footprint, easier data handling.
> > - Derived datatypes (especially those with holes): better memory
> > footprint.
> > - MPI_ANY_SOURCE: faster, more simple multifabric progress.
> > - File I/O: smaller requests, easier wait/test functions.
> > - One-sided ops: no passive target w/o MPI calls - no extra progress
> > thread.
> > - Communicator & group management: better memory footprint.
> > - Message tagging: better support for stable dataflow exchanges,
> smaller
> > packets.
> > - Non-blocking communication: easier ordering, simplified request
> > handling.
> >
> > Best regards.
> >
> > Alexander
> >
> > -----Original Message-----
> > From: mpi3-subsetting-bounces_at_[hidden]
> > [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of
> > Torsten Hoefler
> > Sent: Friday, February 29, 2008 5:08 AM
> > To: mpi3-subsetting_at_[hidden]
> > Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff telecon
> > ww09
> >
> > Hi,
> > >    Present: Leonid Meyerguz (Microsoft), Rich Graham (ORNL), Richard
> > Barrett
> > >    (ORNL), Torsten Hoefler (ISU), Alexander Supalov (Intel)
> > just for the record, it's "IU" not "ISU" :-)
> >
> > >    - Scope of the effort
> > >      - Rich
> > >        - Minimum subset consistent with the rest of MPI, for
> > >    performance/memory footprint optimization
> > >        - Danger of splitting MPI, hence against optional features in
> > the
> > >    standard
> > I back that (danger of optional features for portability). I'd propose
> > to split the current standard into mostly self-contained subsets that
> > have clearly defined interfaces to the rest of the standard. Note:
> this
> > only defines logical interfaces, that does *not* define how those
> things
> > are to be implemented. This makes it easier to understand the standard
> > and have separate (portable) libraries for the subsets, it does not
> > influence optimization possibilities by implementing everything in a
> > monolithic block (i.e., central progress).
> >
> > >        - Both blocking & nonblocking belong to the core
> > >      - Torsten
> > >        - Some collectives may go into selectable subsets
> > I see three subsets: blocking colls, non-blocking colls and
> topological
> > colls (maybe also blocking / non-blocking).
> >
> > >        - MPI_ANY_SOURCE considered harmful
> > I'd like to add datatypes and heterogeneity to this list (with regards
> > to performance). Alexander mentioned the dynamics. I think we should
> > have a lit of items ready that could influence optimization
> > possibilities significanty if they were to be announced by the user
> > before he can use them. That would give another strong argument for
> the
> > subsetting.
> >
> > Best,
> >   Torsten
> >
> > --
> >  bash$ :(){ :|:&};: --------------------- http://www.unixer.de/ -----
> > Indiana University    | http://www.indiana.edu
> > Open Systems Lab      | http://osl.iu.edu/
> > 150 S. Woodlawn Ave.  | Bloomington, IN, 474045-7104 | USA
> > Lindley Hall Room 135 | +01 (812) 855-3608
> > _______________________________________________
> > Mpi3-subsetting mailing list
> > Mpi3-subsetting_at_[hidden]
> > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
> > ---------------------------------------------------------------------
> > Intel GmbH
> > Dornacher Strasse 1
> > 85622 Feldkirchen/Muenchen Germany
> > Sitz der Gesellschaft: Feldkirchen bei Muenchen
> > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> > Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> > VAT Registration No.: DE129385895
> > Citibank Frankfurt (BLZ 502 109 00) 600119052
> >
> > This e-mail and any attachments may contain confidential material for
> > the sole use of the intended recipient(s). Any review or distribution
> > by others is strictly prohibited. If you are not the intended
> > recipient, please contact the sender and delete all copies.
> >
> >
> > _______________________________________________
> > Mpi3-subsetting mailing list
> > Mpi3-subsetting_at_[hidden]
> > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
> >
> _______________________________________________
> Mpi3-subsetting mailing list
> Mpi3-subsetting_at_[hidden]
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
> ---------------------------------------------------------------------
> Intel GmbH
> Dornacher Strasse 1
> 85622 Feldkirchen/Muenchen Germany
> Sitz der Gesellschaft: Feldkirchen bei Muenchen
> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> VAT Registration No.: DE129385895
> Citibank Frankfurt (BLZ 502 109 00) 600119052
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
>


From alexander.supalov at [hidden]  Thu Feb 28 23:38:17 2008
From: alexander.supalov at [hidden] (Supalov, Alexander)
Date: Fri, 29 Feb 2008 05:38:17 -0000
Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09
In-Reply-To: <Pine.LNX.4.58.0802282116160.3046@tux213.llnl.gov>
Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A201196FE5@swsmsx413.ger.corp.intel.com>


Hi,

Thanks. I understand your motivation. When you say "most real
applications" - what applications do you mean? At least, in what area?

For the NIC part, the stress was on "here". In my opinion, subsetting is
not about making things more complicated, more challenging to the
implementors, or to the underlying hardware. It's about making things
simple, easy to use, and easy to implement - including implementation of
only those features your users actually need. That the implementation
may be faster due to this is an added bonus, not the primary goal.

Still, regarding user side copying. Yes, when people do this one wonders
why. There's a reason, apart from them: 1) not caring about datatypes
and their complexity and 2) not trusting their performance. A modern
compiler can rather well optimize a loop with a constant stride, and may
have difficulty with an unknown stride. This is why explicit loops are
sometimes indeed faster (much faster) in the resulting code than any
generic implementation.

Best regards.

Alexander

-----Original Message-----
From: Bronis R. de Supinski [mailto:bronis_at_[hidden]] 
Sent: Friday, February 29, 2008 6:20 AM
To: Supalov, Alexander
Cc: mpi3-subsetting_at_[hidden]
Subject: RE: [Mpi3-subsetting] agenda for subsetting kickoff telecon
ww09

Alexander:

Most real applications need to send non-contiguous
data. If they do not use datatypes then they are
doing the equivalent of either the packing/unpacking
or smaller messages at the user level. This s hould
be discouraged, not encouraged. A small savings
in library object size is not ample reason to go
against that. And, yes, we are after encouraging
hardware vendors to provide the right hardware.

Bronis

On Fri, 29 Feb 2008, Supalov, Alexander wrote:

> Hi,
>
> Thanks. I think the main thrust here is the library footprint (no
> pack/unpack, etc.) and complexity of the user side of the datatype
> interface, rather than performance. Many applications just don't need
> any of this, and never will. Why not translating this application
> non-requirement into a minimum MPI subset? Same with
communicator/group
> management, etc.
>
> Moreover, homogeneous installations that dominate HPC now don't
actually
> need any datatype support at all. They send chunks of bytes. This may
> change in the future, though.
>
> A minor performance implication is that without holes that are only
> possible with derived datatypes, one does not need to track this,
split
> the critical path, and make special provisions inside the MPI device
> layer to handle iov or such.
>
> The NIC capability argument is interesting, but it turns the
discussion
> on its head: we're not after motivating network vendors to provide
> scatter/gather in hardware here, are we? Please clarify.
>
> Best regards.
>
> Alexander
>
> -----Original Message-----
> From: mpi3-subsetting-bounces_at_[hidden]
> [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of
Bronis
> R. de Supinski
> Sent: Friday, February 29, 2008 5:53 AM
> To: mpi3-subsetting_at_[hidden]
> Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff telecon
> ww09
>
>
> All:
>
> OK, I have to respond to the notion that derived datatypes
> limit performance. It is just not a reasonable position.
>
> Sure, if you can send contiguous locations, you will get
> higher performance. The problem is that codes do not only
> need to send contiguous data so that is not an adequate
> reason to say derived datatypes limit performance.
>
> So, what is left? That there is some more efficient way
> to send non-contiguous data? How? As multiple messages,
> each of which send contiguous data? If so, then the
> implementation could do this under the covers and the
> datatypes are just a convenience for the user not to
> have to specify the individual sends. OK, suppose that's
> not the reason. Perhaps the user can do the copying into
> a contiguous buffer and get better performance? While
> I have seen this hold with some implementations, it is
> absurd. There is no reason that I can discern as to why
> the user should be able to deduce a better copying
> mechanism than the MPI implementer. So, again, at worst,
> the datatypes should be a convenience. Do you have an
> alternative reason or a refutation of these opinions?
>
> What is more important, it is certainly possible to build
> scatter/gather support into a NIC and achieve better
> performance with datatypes than without. While there are
> issues to be resolved for that (primarily the issue of
> pinning memory), they are solvable with the right hardware
> mechanism. Just because it does not yet exist is not
> an adequate reason to say "Get rid of datatypes". OK,
> you are not saying that but you are saying to deprecate
> them in a sense. And saying you could send contiguous
> sends more efficiently is a bad argument here. How do
> datatypes cause inefficiency for that? How much is
> that cost really? At what point do you hit where the
> answer is "It would be faster not to compute anything"?
>
> Bronis
>
>
> On Fri, 29 Feb 2008, Supalov, Alexander wrote:
>
> > Hi,
> >
> > Thanks. What subsets inside the current standard would you propose?
> What
> > interfaces between them would you envision?
> >
> > Good idea about the optimization opportunities. Here's an initial
> > combined list, with the main benefits as I see them. Please
> > comment/extend.
> >
> > - Dynamic process support: less overhead in the progress engine,
> easier
> > global rank handling.
> > - Heterogeneity: better memory footprint, easier data handling.
> > - Derived datatypes (especially those with holes): better memory
> > footprint.
> > - MPI_ANY_SOURCE: faster, more simple multifabric progress.
> > - File I/O: smaller requests, easier wait/test functions.
> > - One-sided ops: no passive target w/o MPI calls - no extra progress
> > thread.
> > - Communicator & group management: better memory footprint.
> > - Message tagging: better support for stable dataflow exchanges,
> smaller
> > packets.
> > - Non-blocking communication: easier ordering, simplified request
> > handling.
> >
> > Best regards.
> >
> > Alexander
> >
> > -----Original Message-----
> > From: mpi3-subsetting-bounces_at_[hidden]
> > [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of
> > Torsten Hoefler
> > Sent: Friday, February 29, 2008 5:08 AM
> > To: mpi3-subsetting_at_[hidden]
> > Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff telecon
> > ww09
> >
> > Hi,
> > >    Present: Leonid Meyerguz (Microsoft), Rich Graham (ORNL),
Richard
> > Barrett
> > >    (ORNL), Torsten Hoefler (ISU), Alexander Supalov (Intel)
> > just for the record, it's "IU" not "ISU" :-)
> >
> > >    - Scope of the effort
> > >      - Rich
> > >        - Minimum subset consistent with the rest of MPI, for
> > >    performance/memory footprint optimization
> > >        - Danger of splitting MPI, hence against optional features
in
> > the
> > >    standard
> > I back that (danger of optional features for portability). I'd
propose
> > to split the current standard into mostly self-contained subsets
that
> > have clearly defined interfaces to the rest of the standard. Note:
> this
> > only defines logical interfaces, that does *not* define how those
> things
> > are to be implemented. This makes it easier to understand the
standard
> > and have separate (portable) libraries for the subsets, it does not
> > influence optimization possibilities by implementing everything in a
> > monolithic block (i.e., central progress).
> >
> > >        - Both blocking & nonblocking belong to the core
> > >      - Torsten
> > >        - Some collectives may go into selectable subsets
> > I see three subsets: blocking colls, non-blocking colls and
> topological
> > colls (maybe also blocking / non-blocking).
> >
> > >        - MPI_ANY_SOURCE considered harmful
> > I'd like to add datatypes and heterogeneity to this list (with
regards
> > to performance). Alexander mentioned the dynamics. I think we should
> > have a lit of items ready that could influence optimization
> > possibilities significanty if they were to be announced by the user
> > before he can use them. That would give another strong argument for
> the
> > subsetting.
> >
> > Best,
> >   Torsten
> >
> > --
> >  bash$ :(){ :|:&};: --------------------- http://www.unixer.de/
-----
> > Indiana University    | http://www.indiana.edu
> > Open Systems Lab      | http://osl.iu.edu/
> > 150 S. Woodlawn Ave.  | Bloomington, IN, 474045-7104 | USA
> > Lindley Hall Room 135 | +01 (812) 855-3608
> > _______________________________________________
> > Mpi3-subsetting mailing list
> > Mpi3-subsetting_at_[hidden]
> > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
> >
---------------------------------------------------------------------
> > Intel GmbH
> > Dornacher Strasse 1
> > 85622 Feldkirchen/Muenchen Germany
> > Sitz der Gesellschaft: Feldkirchen bei Muenchen
> > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> > Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> > VAT Registration No.: DE129385895
> > Citibank Frankfurt (BLZ 502 109 00) 600119052
> >
> > This e-mail and any attachments may contain confidential material
for
> > the sole use of the intended recipient(s). Any review or
distribution
> > by others is strictly prohibited. If you are not the intended
> > recipient, please contact the sender and delete all copies.
> >
> >
> > _______________________________________________
> > Mpi3-subsetting mailing list
> > Mpi3-subsetting_at_[hidden]
> > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
> >
> _______________________________________________
> Mpi3-subsetting mailing list
> Mpi3-subsetting_at_[hidden]
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
> ---------------------------------------------------------------------
> Intel GmbH
> Dornacher Strasse 1
> 85622 Feldkirchen/Muenchen Germany
> Sitz der Gesellschaft: Feldkirchen bei Muenchen
> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> VAT Registration No.: DE129385895
> Citibank Frankfurt (BLZ 502 109 00) 600119052
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
>
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


From bronis at [hidden]  Fri Feb 29 03:11:27 2008
From: bronis at [hidden] (Bronis R. de Supinski)
Date: Fri, 29 Feb 2008 01:11:27 -0800 (PST)
Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09
In-Reply-To: <5ECAB1304A8B5B4CB3F9D6C01E4E21A201196FE5@swsmsx413.ger.corp.intel.com>
Message-ID: <Pine.LNX.4.58.0802290109010.3046@tux213.llnl.gov>


Alexander:

Re:
> Thanks. I understand your motivation. When you say "most real
> applications" - what applications do you mean? At least, in what area?

? Scientific computing...

> For the NIC part, the stress was on "here". In my opinion, subsetting is
> not about making things more complicated, more challenging to the
> implementors, or to the underlying hardware. It's about making things
> simple, easy to use, and easy to implement - including implementation of
> only those features your users actually need. That the implementation
> may be faster due to this is an added bonus, not the primary goal.

The emphasis here should not be on creating a disincentive
for vendors to do the right thing...

> Still, regarding user side copying. Yes, when people do this one wonders
> why. There's a reason, apart from them: 1) not caring about datatypes
> and their complexity and 2) not trusting their performance. A modern
> compiler can rather well optimize a loop with a constant stride, and may
> have difficulty with an unknown stride. This is why explicit loops are
> sometimes indeed faster (much faster) in the resulting code than any
> generic implementation.

Huh? What makes you think the user copying code is
in terms of constant stride? Generally, it varies
with the input. We are not talking about a simple
situation to optimize at the user level...

Bronis

>
> Best regards.
>
> Alexander
>
> -----Original Message-----
> From: Bronis R. de Supinski [mailto:bronis_at_[hidden]]
> Sent: Friday, February 29, 2008 6:20 AM
> To: Supalov, Alexander
> Cc: mpi3-subsetting_at_[hidden]
> Subject: RE: [Mpi3-subsetting] agenda for subsetting kickoff telecon
> ww09
>
>
> Alexander:
>
> Most real applications need to send non-contiguous
> data. If they do not use datatypes then they are
> doing the equivalent of either the packing/unpacking
> or smaller messages at the user level. This s hould
> be discouraged, not encouraged. A small savings
> in library object size is not ample reason to go
> against that. And, yes, we are after encouraging
> hardware vendors to provide the right hardware.
>
> Bronis
>
>
> On Fri, 29 Feb 2008, Supalov, Alexander wrote:
>
> > Hi,
> >
> > Thanks. I think the main thrust here is the library footprint (no
> > pack/unpack, etc.) and complexity of the user side of the datatype
> > interface, rather than performance. Many applications just don't need
> > any of this, and never will. Why not translating this application
> > non-requirement into a minimum MPI subset? Same with
> communicator/group
> > management, etc.
> >
> > Moreover, homogeneous installations that dominate HPC now don't
> actually
> > need any datatype support at all. They send chunks of bytes. This may
> > change in the future, though.
> >
> > A minor performance implication is that without holes that are only
> > possible with derived datatypes, one does not need to track this,
> split
> > the critical path, and make special provisions inside the MPI device
> > layer to handle iov or such.
> >
> > The NIC capability argument is interesting, but it turns the
> discussion
> > on its head: we're not after motivating network vendors to provide
> > scatter/gather in hardware here, are we? Please clarify.
> >
> > Best regards.
> >
> > Alexander
> >
> > -----Original Message-----
> > From: mpi3-subsetting-bounces_at_[hidden]
> > [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of
> Bronis
> > R. de Supinski
> > Sent: Friday, February 29, 2008 5:53 AM
> > To: mpi3-subsetting_at_[hidden]
> > Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff telecon
> > ww09
> >
> >
> > All:
> >
> > OK, I have to respond to the notion that derived datatypes
> > limit performance. It is just not a reasonable position.
> >
> > Sure, if you can send contiguous locations, you will get
> > higher performance. The problem is that codes do not only
> > need to send contiguous data so that is not an adequate
> > reason to say derived datatypes limit performance.
> >
> > So, what is left? That there is some more efficient way
> > to send non-contiguous data? How? As multiple messages,
> > each of which send contiguous data? If so, then the
> > implementation could do this under the covers and the
> > datatypes are just a convenience for the user not to
> > have to specify the individual sends. OK, suppose that's
> > not the reason. Perhaps the user can do the copying into
> > a contiguous buffer and get better performance? While
> > I have seen this hold with some implementations, it is
> > absurd. There is no reason that I can discern as to why
> > the user should be able to deduce a better copying
> > mechanism than the MPI implementer. So, again, at worst,
> > the datatypes should be a convenience. Do you have an
> > alternative reason or a refutation of these opinions?
> >
> > What is more important, it is certainly possible to build
> > scatter/gather support into a NIC and achieve better
> > performance with datatypes than without. While there are
> > issues to be resolved for that (primarily the issue of
> > pinning memory), they are solvable with the right hardware
> > mechanism. Just because it does not yet exist is not
> > an adequate reason to say "Get rid of datatypes". OK,
> > you are not saying that but you are saying to deprecate
> > them in a sense. And saying you could send contiguous
> > sends more efficiently is a bad argument here. How do
> > datatypes cause inefficiency for that? How much is
> > that cost really? At what point do you hit where the
> > answer is "It would be faster not to compute anything"?
> >
> > Bronis
> >
> >
> > On Fri, 29 Feb 2008, Supalov, Alexander wrote:
> >
> > > Hi,
> > >
> > > Thanks. What subsets inside the current standard would you propose?
> > What
> > > interfaces between them would you envision?
> > >
> > > Good idea about the optimization opportunities. Here's an initial
> > > combined list, with the main benefits as I see them. Please
> > > comment/extend.
> > >
> > > - Dynamic process support: less overhead in the progress engine,
> > easier
> > > global rank handling.
> > > - Heterogeneity: better memory footprint, easier data handling.
> > > - Derived datatypes (especially those with holes): better memory
> > > footprint.
> > > - MPI_ANY_SOURCE: faster, more simple multifabric progress.
> > > - File I/O: smaller requests, easier wait/test functions.
> > > - One-sided ops: no passive target w/o MPI calls - no extra progress
> > > thread.
> > > - Communicator & group management: better memory footprint.
> > > - Message tagging: better support for stable dataflow exchanges,
> > smaller
> > > packets.
> > > - Non-blocking communication: easier ordering, simplified request
> > > handling.
> > >
> > > Best regards.
> > >
> > > Alexander
> > >
> > > -----Original Message-----
> > > From: mpi3-subsetting-bounces_at_[hidden]
> > > [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of
> > > Torsten Hoefler
> > > Sent: Friday, February 29, 2008 5:08 AM
> > > To: mpi3-subsetting_at_[hidden]
> > > Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff telecon
> > > ww09
> > >
> > > Hi,
> > > >    Present: Leonid Meyerguz (Microsoft), Rich Graham (ORNL),
> Richard
> > > Barrett
> > > >    (ORNL), Torsten Hoefler (ISU), Alexander Supalov (Intel)
> > > just for the record, it's "IU" not "ISU" :-)
> > >
> > > >    - Scope of the effort
> > > >      - Rich
> > > >        - Minimum subset consistent with the rest of MPI, for
> > > >    performance/memory footprint optimization
> > > >        - Danger of splitting MPI, hence against optional features
> in
> > > the
> > > >    standard
> > > I back that (danger of optional features for portability). I'd
> propose
> > > to split the current standard into mostly self-contained subsets
> that
> > > have clearly defined interfaces to the rest of the standard. Note:
> > this
> > > only defines logical interfaces, that does *not* define how those
> > things
> > > are to be implemented. This makes it easier to understand the
> standard
> > > and have separate (portable) libraries for the subsets, it does not
> > > influence optimization possibilities by implementing everything in a
> > > monolithic block (i.e., central progress).
> > >
> > > >        - Both blocking & nonblocking belong to the core
> > > >      - Torsten
> > > >        - Some collectives may go into selectable subsets
> > > I see three subsets: blocking colls, non-blocking colls and
> > topological
> > > colls (maybe also blocking / non-blocking).
> > >
> > > >        - MPI_ANY_SOURCE considered harmful
> > > I'd like to add datatypes and heterogeneity to this list (with
> regards
> > > to performance). Alexander mentioned the dynamics. I think we should
> > > have a lit of items ready that could influence optimization
> > > possibilities significanty if they were to be announced by the user
> > > before he can use them. That would give another strong argument for
> > the
> > > subsetting.
> > >
> > > Best,
> > >   Torsten
> > >
> > > --
> > >  bash$ :(){ :|:&};: --------------------- http://www.unixer.de/
> -----
> > > Indiana University    | http://www.indiana.edu
> > > Open Systems Lab      | http://osl.iu.edu/
> > > 150 S. Woodlawn Ave.  | Bloomington, IN, 474045-7104 | USA
> > > Lindley Hall Room 135 | +01 (812) 855-3608
> > > _______________________________________________
> > > Mpi3-subsetting mailing list
> > > Mpi3-subsetting_at_[hidden]
> > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
> > >
> ---------------------------------------------------------------------
> > > Intel GmbH
> > > Dornacher Strasse 1
> > > 85622 Feldkirchen/Muenchen Germany
> > > Sitz der Gesellschaft: Feldkirchen bei Muenchen
> > > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> > > Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> > > VAT Registration No.: DE129385895
> > > Citibank Frankfurt (BLZ 502 109 00) 600119052
> > >
> > > This e-mail and any attachments may contain confidential material
> for
> > > the sole use of the intended recipient(s). Any review or
> distribution
> > > by others is strictly prohibited. If you are not the intended
> > > recipient, please contact the sender and delete all copies.
> > >
> > >
> > > _______________________________________________
> > > Mpi3-subsetting mailing list
> > > Mpi3-subsetting_at_[hidden]
> > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
> > >
> > _______________________________________________
> > Mpi3-subsetting mailing list
> > Mpi3-subsetting_at_[hidden]
> > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
> > ---------------------------------------------------------------------
> > Intel GmbH
> > Dornacher Strasse 1
> > 85622 Feldkirchen/Muenchen Germany
> > Sitz der Gesellschaft: Feldkirchen bei Muenchen
> > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> > Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> > VAT Registration No.: DE129385895
> > Citibank Frankfurt (BLZ 502 109 00) 600119052
> >
> > This e-mail and any attachments may contain confidential material for
> > the sole use of the intended recipient(s). Any review or distribution
> > by others is strictly prohibited. If you are not the intended
> > recipient, please contact the sender and delete all copies.
> >
> >
> ---------------------------------------------------------------------
> Intel GmbH
> Dornacher Strasse 1
> 85622 Feldkirchen/Muenchen Germany
> Sitz der Gesellschaft: Feldkirchen bei Muenchen
> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> VAT Registration No.: DE129385895
> Citibank Frankfurt (BLZ 502 109 00) 600119052
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
>


From alexander.supalov at [hidden]  Fri Feb 29 05:39:40 2008
From: alexander.supalov at [hidden] (Supalov, Alexander)
Date: Fri, 29 Feb 2008 11:39:40 -0000
Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09
In-Reply-To: <Pine.LNX.4.58.0802290109010.3046@tux213.llnl.gov>
Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A20119726C@swsmsx413.ger.corp.intel.com>


Dear Bronis,

Thanks. What scientific computing codes do you mean here - chemistry,
structural mechanics, fluid dynamics, genomics, something else? Or do
you speak generally of any code that needs sparse data structures? If
so, what's your estimate of the relative number of such codes compared
to those that do not need sparse datatypes? In what domain?

The right doze of vendor motivation not to do wrong things is a good
point, I'll consider it.

Finally, the constant stride copying was but an example when inlining
may help to users achieve higher performance. There may be other
examples known in the scientific computing area. However, since
performance is not primary goal for datatypes, I suggest we let this
matter rest for a while.

Best regards.

Alexander 

-----Original Message-----
From: Bronis R. de Supinski [mailto:bronis_at_[hidden]] 
Sent: Friday, February 29, 2008 10:11 AM
To: Supalov, Alexander
Cc: mpi3-subsetting_at_[hidden]
Subject: RE: [Mpi3-subsetting] agenda for subsetting kickoff telecon
ww09

Alexander:

Re:
> Thanks. I understand your motivation. When you say "most real
> applications" - what applications do you mean? At least, in what area?

? Scientific computing...

> For the NIC part, the stress was on "here". In my opinion, subsetting
is
> not about making things more complicated, more challenging to the
> implementors, or to the underlying hardware. It's about making things
> simple, easy to use, and easy to implement - including implementation
of
> only those features your users actually need. That the implementation
> may be faster due to this is an added bonus, not the primary goal.

The emphasis here should not be on creating a disincentive
for vendors to do the right thing...

> Still, regarding user side copying. Yes, when people do this one
wonders
> why. There's a reason, apart from them: 1) not caring about datatypes
> and their complexity and 2) not trusting their performance. A modern
> compiler can rather well optimize a loop with a constant stride, and
may
> have difficulty with an unknown stride. This is why explicit loops are
> sometimes indeed faster (much faster) in the resulting code than any
> generic implementation.

Huh? What makes you think the user copying code is
in terms of constant stride? Generally, it varies
with the input. We are not talking about a simple
situation to optimize at the user level...

Bronis

>
> Best regards.
>
> Alexander
>
> -----Original Message-----
> From: Bronis R. de Supinski [mailto:bronis_at_[hidden]]
> Sent: Friday, February 29, 2008 6:20 AM
> To: Supalov, Alexander
> Cc: mpi3-subsetting_at_[hidden]
> Subject: RE: [Mpi3-subsetting] agenda for subsetting kickoff telecon
> ww09
>
>
> Alexander:
>
> Most real applications need to send non-contiguous
> data. If they do not use datatypes then they are
> doing the equivalent of either the packing/unpacking
> or smaller messages at the user level. This s hould
> be discouraged, not encouraged. A small savings
> in library object size is not ample reason to go
> against that. And, yes, we are after encouraging
> hardware vendors to provide the right hardware.
>
> Bronis
>
>
> On Fri, 29 Feb 2008, Supalov, Alexander wrote:
>
> > Hi,
> >
> > Thanks. I think the main thrust here is the library footprint (no
> > pack/unpack, etc.) and complexity of the user side of the datatype
> > interface, rather than performance. Many applications just don't
need
> > any of this, and never will. Why not translating this application
> > non-requirement into a minimum MPI subset? Same with
> communicator/group
> > management, etc.
> >
> > Moreover, homogeneous installations that dominate HPC now don't
> actually
> > need any datatype support at all. They send chunks of bytes. This
may
> > change in the future, though.
> >
> > A minor performance implication is that without holes that are only
> > possible with derived datatypes, one does not need to track this,
> split
> > the critical path, and make special provisions inside the MPI device
> > layer to handle iov or such.
> >
> > The NIC capability argument is interesting, but it turns the
> discussion
> > on its head: we're not after motivating network vendors to provide
> > scatter/gather in hardware here, are we? Please clarify.
> >
> > Best regards.
> >
> > Alexander
> >
> > -----Original Message-----
> > From: mpi3-subsetting-bounces_at_[hidden]
> > [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of
> Bronis
> > R. de Supinski
> > Sent: Friday, February 29, 2008 5:53 AM
> > To: mpi3-subsetting_at_[hidden]
> > Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff telecon
> > ww09
> >
> >
> > All:
> >
> > OK, I have to respond to the notion that derived datatypes
> > limit performance. It is just not a reasonable position.
> >
> > Sure, if you can send contiguous locations, you will get
> > higher performance. The problem is that codes do not only
> > need to send contiguous data so that is not an adequate
> > reason to say derived datatypes limit performance.
> >
> > So, what is left? That there is some more efficient way
> > to send non-contiguous data? How? As multiple messages,
> > each of which send contiguous data? If so, then the
> > implementation could do this under the covers and the
> > datatypes are just a convenience for the user not to
> > have to specify the individual sends. OK, suppose that's
> > not the reason. Perhaps the user can do the copying into
> > a contiguous buffer and get better performance? While
> > I have seen this hold with some implementations, it is
> > absurd. There is no reason that I can discern as to why
> > the user should be able to deduce a better copying
> > mechanism than the MPI implementer. So, again, at worst,
> > the datatypes should be a convenience. Do you have an
> > alternative reason or a refutation of these opinions?
> >
> > What is more important, it is certainly possible to build
> > scatter/gather support into a NIC and achieve better
> > performance with datatypes than without. While there are
> > issues to be resolved for that (primarily the issue of
> > pinning memory), they are solvable with the right hardware
> > mechanism. Just because it does not yet exist is not
> > an adequate reason to say "Get rid of datatypes". OK,
> > you are not saying that but you are saying to deprecate
> > them in a sense. And saying you could send contiguous
> > sends more efficiently is a bad argument here. How do
> > datatypes cause inefficiency for that? How much is
> > that cost really? At what point do you hit where the
> > answer is "It would be faster not to compute anything"?
> >
> > Bronis
> >
> >
> > On Fri, 29 Feb 2008, Supalov, Alexander wrote:
> >
> > > Hi,
> > >
> > > Thanks. What subsets inside the current standard would you
propose?
> > What
> > > interfaces between them would you envision?
> > >
> > > Good idea about the optimization opportunities. Here's an initial
> > > combined list, with the main benefits as I see them. Please
> > > comment/extend.
> > >
> > > - Dynamic process support: less overhead in the progress engine,
> > easier
> > > global rank handling.
> > > - Heterogeneity: better memory footprint, easier data handling.
> > > - Derived datatypes (especially those with holes): better memory
> > > footprint.
> > > - MPI_ANY_SOURCE: faster, more simple multifabric progress.
> > > - File I/O: smaller requests, easier wait/test functions.
> > > - One-sided ops: no passive target w/o MPI calls - no extra
progress
> > > thread.
> > > - Communicator & group management: better memory footprint.
> > > - Message tagging: better support for stable dataflow exchanges,
> > smaller
> > > packets.
> > > - Non-blocking communication: easier ordering, simplified request
> > > handling.
> > >
> > > Best regards.
> > >
> > > Alexander
> > >
> > > -----Original Message-----
> > > From: mpi3-subsetting-bounces_at_[hidden]
> > > [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of
> > > Torsten Hoefler
> > > Sent: Friday, February 29, 2008 5:08 AM
> > > To: mpi3-subsetting_at_[hidden]
> > > Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff
telecon
> > > ww09
> > >
> > > Hi,
> > > >    Present: Leonid Meyerguz (Microsoft), Rich Graham (ORNL),
> Richard
> > > Barrett
> > > >    (ORNL), Torsten Hoefler (ISU), Alexander Supalov (Intel)
> > > just for the record, it's "IU" not "ISU" :-)
> > >
> > > >    - Scope of the effort
> > > >      - Rich
> > > >        - Minimum subset consistent with the rest of MPI, for
> > > >    performance/memory footprint optimization
> > > >        - Danger of splitting MPI, hence against optional
features
> in
> > > the
> > > >    standard
> > > I back that (danger of optional features for portability). I'd
> propose
> > > to split the current standard into mostly self-contained subsets
> that
> > > have clearly defined interfaces to the rest of the standard. Note:
> > this
> > > only defines logical interfaces, that does *not* define how those
> > things
> > > are to be implemented. This makes it easier to understand the
> standard
> > > and have separate (portable) libraries for the subsets, it does
not
> > > influence optimization possibilities by implementing everything in
a
> > > monolithic block (i.e., central progress).
> > >
> > > >        - Both blocking & nonblocking belong to the core
> > > >      - Torsten
> > > >        - Some collectives may go into selectable subsets
> > > I see three subsets: blocking colls, non-blocking colls and
> > topological
> > > colls (maybe also blocking / non-blocking).
> > >
> > > >        - MPI_ANY_SOURCE considered harmful
> > > I'd like to add datatypes and heterogeneity to this list (with
> regards
> > > to performance). Alexander mentioned the dynamics. I think we
should
> > > have a lit of items ready that could influence optimization
> > > possibilities significanty if they were to be announced by the
user
> > > before he can use them. That would give another strong argument
for
> > the
> > > subsetting.
> > >
> > > Best,
> > >   Torsten
> > >
> > > --
> > >  bash$ :(){ :|:&};: --------------------- http://www.unixer.de/
> -----
> > > Indiana University    | http://www.indiana.edu
> > > Open Systems Lab      | http://osl.iu.edu/
> > > 150 S. Woodlawn Ave.  | Bloomington, IN, 474045-7104 | USA
> > > Lindley Hall Room 135 | +01 (812) 855-3608
> > > _______________________________________________
> > > Mpi3-subsetting mailing list
> > > Mpi3-subsetting_at_[hidden]
> > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
> > >
> ---------------------------------------------------------------------
> > > Intel GmbH
> > > Dornacher Strasse 1
> > > 85622 Feldkirchen/Muenchen Germany
> > > Sitz der Gesellschaft: Feldkirchen bei Muenchen
> > > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes
Schwaderer
> > > Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> > > VAT Registration No.: DE129385895
> > > Citibank Frankfurt (BLZ 502 109 00) 600119052
> > >
> > > This e-mail and any attachments may contain confidential material
> for
> > > the sole use of the intended recipient(s). Any review or
> distribution
> > > by others is strictly prohibited. If you are not the intended
> > > recipient, please contact the sender and delete all copies.
> > >
> > >
> > > _______________________________________________
> > > Mpi3-subsetting mailing list
> > > Mpi3-subsetting_at_[hidden]
> > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
> > >
> > _______________________________________________
> > Mpi3-subsetting mailing list
> > Mpi3-subsetting_at_[hidden]
> > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
> >
---------------------------------------------------------------------
> > Intel GmbH
> > Dornacher Strasse 1
> > 85622 Feldkirchen/Muenchen Germany
> > Sitz der Gesellschaft: Feldkirchen bei Muenchen
> > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> > Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> > VAT Registration No.: DE129385895
> > Citibank Frankfurt (BLZ 502 109 00) 600119052
> >
> > This e-mail and any attachments may contain confidential material
for
> > the sole use of the intended recipient(s). Any review or
distribution
> > by others is strictly prohibited. If you are not the intended
> > recipient, please contact the sender and delete all copies.
> >
> >
> ---------------------------------------------------------------------
> Intel GmbH
> Dornacher Strasse 1
> 85622 Feldkirchen/Muenchen Germany
> Sitz der Gesellschaft: Feldkirchen bei Muenchen
> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> VAT Registration No.: DE129385895
> Citibank Frankfurt (BLZ 502 109 00) 600119052
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
>
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


From bronis at [hidden]  Fri Feb 29 07:17:19 2008
From: bronis at [hidden] (Bronis R. de Supinski)
Date: Fri, 29 Feb 2008 05:17:19 -0800 (PST)
Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09
In-Reply-To: <5ECAB1304A8B5B4CB3F9D6C01E4E21A20119726C@swsmsx413.ger.corp.intel.com>
Message-ID: <Pine.LNX.4.58.0802290513140.3046@tux213.llnl.gov>


Alexander:

It is the vast majority of scientific applications.
It is not just ones that need sparse data structures.
A stencil application that uses dense matrices has
strided non-contiguous data transfers for half (2D)
or more (3D or more complex stencils) of its
communication. Non-contiguous communication is
the reality of distributed memory computing...

I am fine with letting this rest but my point is
that an emphasis on performance by implementers
should be the case for datatypes...

Bronis

On Fri, 29 Feb 2008, Supalov, Alexander wrote:

> Dear Bronis,
>
> Thanks. What scientific computing codes do you mean here - chemistry,
> structural mechanics, fluid dynamics, genomics, something else? Or do
> you speak generally of any code that needs sparse data structures? If
> so, what's your estimate of the relative number of such codes compared
> to those that do not need sparse datatypes? In what domain?
>
> The right doze of vendor motivation not to do wrong things is a good
> point, I'll consider it.
>
> Finally, the constant stride copying was but an example when inlining
> may help to users achieve higher performance. There may be other
> examples known in the scientific computing area. However, since
> performance is not primary goal for datatypes, I suggest we let this
> matter rest for a while.
>
> Best regards.
>
> Alexander
>
> -----Original Message-----
> From: Bronis R. de Supinski [mailto:bronis_at_[hidden]]
> Sent: Friday, February 29, 2008 10:11 AM
> To: Supalov, Alexander
> Cc: mpi3-subsetting_at_[hidden]
> Subject: RE: [Mpi3-subsetting] agenda for subsetting kickoff telecon
> ww09
>
>
> Alexander:
>
> Re:
> > Thanks. I understand your motivation. When you say "most real
> > applications" - what applications do you mean? At least, in what area?
>
> ? Scientific computing...
>
> > For the NIC part, the stress was on "here". In my opinion, subsetting
> is
> > not about making things more complicated, more challenging to the
> > implementors, or to the underlying hardware. It's about making things
> > simple, easy to use, and easy to implement - including implementation
> of
> > only those features your users actually need. That the implementation
> > may be faster due to this is an added bonus, not the primary goal.
>
> The emphasis here should not be on creating a disincentive
> for vendors to do the right thing...
>
> > Still, regarding user side copying. Yes, when people do this one
> wonders
> > why. There's a reason, apart from them: 1) not caring about datatypes
> > and their complexity and 2) not trusting their performance. A modern
> > compiler can rather well optimize a loop with a constant stride, and
> may
> > have difficulty with an unknown stride. This is why explicit loops are
> > sometimes indeed faster (much faster) in the resulting code than any
> > generic implementation.
>
> Huh? What makes you think the user copying code is
> in terms of constant stride? Generally, it varies
> with the input. We are not talking about a simple
> situation to optimize at the user level...
>
> Bronis
>
>
> >
> > Best regards.
> >
> > Alexander
> >
> > -----Original Message-----
> > From: Bronis R. de Supinski [mailto:bronis_at_[hidden]]
> > Sent: Friday, February 29, 2008 6:20 AM
> > To: Supalov, Alexander
> > Cc: mpi3-subsetting_at_[hidden]
> > Subject: RE: [Mpi3-subsetting] agenda for subsetting kickoff telecon
> > ww09
> >
> >
> > Alexander:
> >
> > Most real applications need to send non-contiguous
> > data. If they do not use datatypes then they are
> > doing the equivalent of either the packing/unpacking
> > or smaller messages at the user level. This s hould
> > be discouraged, not encouraged. A small savings
> > in library object size is not ample reason to go
> > against that. And, yes, we are after encouraging
> > hardware vendors to provide the right hardware.
> >
> > Bronis
> >
> >
> > On Fri, 29 Feb 2008, Supalov, Alexander wrote:
> >
> > > Hi,
> > >
> > > Thanks. I think the main thrust here is the library footprint (no
> > > pack/unpack, etc.) and complexity of the user side of the datatype
> > > interface, rather than performance. Many applications just don't
> need
> > > any of this, and never will. Why not translating this application
> > > non-requirement into a minimum MPI subset? Same with
> > communicator/group
> > > management, etc.
> > >
> > > Moreover, homogeneous installations that dominate HPC now don't
> > actually
> > > need any datatype support at all. They send chunks of bytes. This
> may
> > > change in the future, though.
> > >
> > > A minor performance implication is that without holes that are only
> > > possible with derived datatypes, one does not need to track this,
> > split
> > > the critical path, and make special provisions inside the MPI device
> > > layer to handle iov or such.
> > >
> > > The NIC capability argument is interesting, but it turns the
> > discussion
> > > on its head: we're not after motivating network vendors to provide
> > > scatter/gather in hardware here, are we? Please clarify.
> > >
> > > Best regards.
> > >
> > > Alexander
> > >
> > > -----Original Message-----
> > > From: mpi3-subsetting-bounces_at_[hidden]
> > > [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of
> > Bronis
> > > R. de Supinski
> > > Sent: Friday, February 29, 2008 5:53 AM
> > > To: mpi3-subsetting_at_[hidden]
> > > Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff telecon
> > > ww09
> > >
> > >
> > > All:
> > >
> > > OK, I have to respond to the notion that derived datatypes
> > > limit performance. It is just not a reasonable position.
> > >
> > > Sure, if you can send contiguous locations, you will get
> > > higher performance. The problem is that codes do not only
> > > need to send contiguous data so that is not an adequate
> > > reason to say derived datatypes limit performance.
> > >
> > > So, what is left? That there is some more efficient way
> > > to send non-contiguous data? How? As multiple messages,
> > > each of which send contiguous data? If so, then the
> > > implementation could do this under the covers and the
> > > datatypes are just a convenience for the user not to
> > > have to specify the individual sends. OK, suppose that's
> > > not the reason. Perhaps the user can do the copying into
> > > a contiguous buffer and get better performance? While
> > > I have seen this hold with some implementations, it is
> > > absurd. There is no reason that I can discern as to why
> > > the user should be able to deduce a better copying
> > > mechanism than the MPI implementer. So, again, at worst,
> > > the datatypes should be a convenience. Do you have an
> > > alternative reason or a refutation of these opinions?
> > >
> > > What is more important, it is certainly possible to build
> > > scatter/gather support into a NIC and achieve better
> > > performance with datatypes than without. While there are
> > > issues to be resolved for that (primarily the issue of
> > > pinning memory), they are solvable with the right hardware
> > > mechanism. Just because it does not yet exist is not
> > > an adequate reason to say "Get rid of datatypes". OK,
> > > you are not saying that but you are saying to deprecate
> > > them in a sense. And saying you could send contiguous
> > > sends more efficiently is a bad argument here. How do
> > > datatypes cause inefficiency for that? How much is
> > > that cost really? At what point do you hit where the
> > > answer is "It would be faster not to compute anything"?
> > >
> > > Bronis
> > >
> > >
> > > On Fri, 29 Feb 2008, Supalov, Alexander wrote:
> > >
> > > > Hi,
> > > >
> > > > Thanks. What subsets inside the current standard would you
> propose?
> > > What
> > > > interfaces between them would you envision?
> > > >
> > > > Good idea about the optimization opportunities. Here's an initial
> > > > combined list, with the main benefits as I see them. Please
> > > > comment/extend.
> > > >
> > > > - Dynamic process support: less overhead in the progress engine,
> > > easier
> > > > global rank handling.
> > > > - Heterogeneity: better memory footprint, easier data handling.
> > > > - Derived datatypes (especially those with holes): better memory
> > > > footprint.
> > > > - MPI_ANY_SOURCE: faster, more simple multifabric progress.
> > > > - File I/O: smaller requests, easier wait/test functions.
> > > > - One-sided ops: no passive target w/o MPI calls - no extra
> progress
> > > > thread.
> > > > - Communicator & group management: better memory footprint.
> > > > - Message tagging: better support for stable dataflow exchanges,
> > > smaller
> > > > packets.
> > > > - Non-blocking communication: easier ordering, simplified request
> > > > handling.
> > > >
> > > > Best regards.
> > > >
> > > > Alexander
> > > >
> > > > -----Original Message-----
> > > > From: mpi3-subsetting-bounces_at_[hidden]
> > > > [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of
> > > > Torsten Hoefler
> > > > Sent: Friday, February 29, 2008 5:08 AM
> > > > To: mpi3-subsetting_at_[hidden]
> > > > Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff
> telecon
> > > > ww09
> > > >
> > > > Hi,
> > > > >    Present: Leonid Meyerguz (Microsoft), Rich Graham (ORNL),
> > Richard
> > > > Barrett
> > > > >    (ORNL), Torsten Hoefler (ISU), Alexander Supalov (Intel)
> > > > just for the record, it's "IU" not "ISU" :-)
> > > >
> > > > >    - Scope of the effort
> > > > >      - Rich
> > > > >        - Minimum subset consistent with the rest of MPI, for
> > > > >    performance/memory footprint optimization
> > > > >        - Danger of splitting MPI, hence against optional
> features
> > in
> > > > the
> > > > >    standard
> > > > I back that (danger of optional features for portability). I'd
> > propose
> > > > to split the current standard into mostly self-contained subsets
> > that
> > > > have clearly defined interfaces to the rest of the standard. Note:
> > > this
> > > > only defines logical interfaces, that does *not* define how those
> > > things
> > > > are to be implemented. This makes it easier to understand the
> > standard
> > > > and have separate (portable) libraries for the subsets, it does
> not
> > > > influence optimization possibilities by implementing everything in
> a
> > > > monolithic block (i.e., central progress).
> > > >
> > > > >        - Both blocking & nonblocking belong to the core
> > > > >      - Torsten
> > > > >        - Some collectives may go into selectable subsets
> > > > I see three subsets: blocking colls, non-blocking colls and
> > > topological
> > > > colls (maybe also blocking / non-blocking).
> > > >
> > > > >        - MPI_ANY_SOURCE considered harmful
> > > > I'd like to add datatypes and heterogeneity to this list (with
> > regards
> > > > to performance). Alexander mentioned the dynamics. I think we
> should
> > > > have a lit of items ready that could influence optimization
> > > > possibilities significanty if they were to be announced by the
> user
> > > > before he can use them. That would give another strong argument
> for
> > > the
> > > > subsetting.
> > > >
> > > > Best,
> > > >   Torsten
> > > >
> > > > --
> > > >  bash$ :(){ :|:&};: --------------------- http://www.unixer.de/
> > -----
> > > > Indiana University    | http://www.indiana.edu
> > > > Open Systems Lab      | http://osl.iu.edu/
> > > > 150 S. Woodlawn Ave.  | Bloomington, IN, 474045-7104 | USA
> > > > Lindley Hall Room 135 | +01 (812) 855-3608
> > > > _______________________________________________
> > > > Mpi3-subsetting mailing list
> > > > Mpi3-subsetting_at_[hidden]
> > > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
> > > >
> > ---------------------------------------------------------------------
> > > > Intel GmbH
> > > > Dornacher Strasse 1
> > > > 85622 Feldkirchen/Muenchen Germany
> > > > Sitz der Gesellschaft: Feldkirchen bei Muenchen
> > > > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes
> Schwaderer
> > > > Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> > > > VAT Registration No.: DE129385895
> > > > Citibank Frankfurt (BLZ 502 109 00) 600119052
> > > >
> > > > This e-mail and any attachments may contain confidential material
> > for
> > > > the sole use of the intended recipient(s). Any review or
> > distribution
> > > > by others is strictly prohibited. If you are not the intended
> > > > recipient, please contact the sender and delete all copies.
> > > >
> > > >
> > > > _______________________________________________
> > > > Mpi3-subsetting mailing list
> > > > Mpi3-subsetting_at_[hidden]
> > > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
> > > >
> > > _______________________________________________
> > > Mpi3-subsetting mailing list
> > > Mpi3-subsetting_at_[hidden]
> > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
> > >
> ---------------------------------------------------------------------
> > > Intel GmbH
> > > Dornacher Strasse 1
> > > 85622 Feldkirchen/Muenchen Germany
> > > Sitz der Gesellschaft: Feldkirchen bei Muenchen
> > > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> > > Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> > > VAT Registration No.: DE129385895
> > > Citibank Frankfurt (BLZ 502 109 00) 600119052
> > >
> > > This e-mail and any attachments may contain confidential material
> for
> > > the sole use of the intended recipient(s). Any review or
> distribution
> > > by others is strictly prohibited. If you are not the intended
> > > recipient, please contact the sender and delete all copies.
> > >
> > >
> > ---------------------------------------------------------------------
> > Intel GmbH
> > Dornacher Strasse 1
> > 85622 Feldkirchen/Muenchen Germany
> > Sitz der Gesellschaft: Feldkirchen bei Muenchen
> > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> > Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> > VAT Registration No.: DE129385895
> > Citibank Frankfurt (BLZ 502 109 00) 600119052
> >
> > This e-mail and any attachments may contain confidential material for
> > the sole use of the intended recipient(s). Any review or distribution
> > by others is strictly prohibited. If you are not the intended
> > recipient, please contact the sender and delete all copies.
> >
> >
> ---------------------------------------------------------------------
> Intel GmbH
> Dornacher Strasse 1
> 85622 Feldkirchen/Muenchen Germany
> Sitz der Gesellschaft: Feldkirchen bei Muenchen
> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> VAT Registration No.: DE129385895
> Citibank Frankfurt (BLZ 502 109 00) 600119052
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
>


From alexander.supalov at [hidden]  Fri Feb 29 07:38:48 2008
From: alexander.supalov at [hidden] (Supalov, Alexander)
Date: Fri, 29 Feb 2008 13:38:48 -0000
Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09
In-Reply-To: <Pine.LNX.4.58.0802290513140.3046@tux213.llnl.gov>
Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A20119737F@swsmsx413.ger.corp.intel.com>


OK, thanks. 

-----Original Message-----
From: Bronis R. de Supinski [mailto:bronis_at_[hidden]] 
Sent: Friday, February 29, 2008 2:17 PM
To: Supalov, Alexander
Cc: mpi3-subsetting_at_[hidden]
Subject: RE: [Mpi3-subsetting] agenda for subsetting kickoff telecon
ww09

Alexander:

It is the vast majority of scientific applications.
It is not just ones that need sparse data structures.
A stencil application that uses dense matrices has
strided non-contiguous data transfers for half (2D)
or more (3D or more complex stencils) of its
communication. Non-contiguous communication is
the reality of distributed memory computing...

I am fine with letting this rest but my point is
that an emphasis on performance by implementers
should be the case for datatypes...

Bronis

On Fri, 29 Feb 2008, Supalov, Alexander wrote:

> Dear Bronis,
>
> Thanks. What scientific computing codes do you mean here - chemistry,
> structural mechanics, fluid dynamics, genomics, something else? Or do
> you speak generally of any code that needs sparse data structures? If
> so, what's your estimate of the relative number of such codes compared
> to those that do not need sparse datatypes? In what domain?
>
> The right doze of vendor motivation not to do wrong things is a good
> point, I'll consider it.
>
> Finally, the constant stride copying was but an example when inlining
> may help to users achieve higher performance. There may be other
> examples known in the scientific computing area. However, since
> performance is not primary goal for datatypes, I suggest we let this
> matter rest for a while.
>
> Best regards.
>
> Alexander
>
> -----Original Message-----
> From: Bronis R. de Supinski [mailto:bronis_at_[hidden]]
> Sent: Friday, February 29, 2008 10:11 AM
> To: Supalov, Alexander
> Cc: mpi3-subsetting_at_[hidden]
> Subject: RE: [Mpi3-subsetting] agenda for subsetting kickoff telecon
> ww09
>
>
> Alexander:
>
> Re:
> > Thanks. I understand your motivation. When you say "most real
> > applications" - what applications do you mean? At least, in what
area?
>
> ? Scientific computing...
>
> > For the NIC part, the stress was on "here". In my opinion,
subsetting
> is
> > not about making things more complicated, more challenging to the
> > implementors, or to the underlying hardware. It's about making
things
> > simple, easy to use, and easy to implement - including
implementation
> of
> > only those features your users actually need. That the
implementation
> > may be faster due to this is an added bonus, not the primary goal.
>
> The emphasis here should not be on creating a disincentive
> for vendors to do the right thing...
>
> > Still, regarding user side copying. Yes, when people do this one
> wonders
> > why. There's a reason, apart from them: 1) not caring about
datatypes
> > and their complexity and 2) not trusting their performance. A modern
> > compiler can rather well optimize a loop with a constant stride, and
> may
> > have difficulty with an unknown stride. This is why explicit loops
are
> > sometimes indeed faster (much faster) in the resulting code than any
> > generic implementation.
>
> Huh? What makes you think the user copying code is
> in terms of constant stride? Generally, it varies
> with the input. We are not talking about a simple
> situation to optimize at the user level...
>
> Bronis
>
>
> >
> > Best regards.
> >
> > Alexander
> >
> > -----Original Message-----
> > From: Bronis R. de Supinski [mailto:bronis_at_[hidden]]
> > Sent: Friday, February 29, 2008 6:20 AM
> > To: Supalov, Alexander
> > Cc: mpi3-subsetting_at_[hidden]
> > Subject: RE: [Mpi3-subsetting] agenda for subsetting kickoff telecon
> > ww09
> >
> >
> > Alexander:
> >
> > Most real applications need to send non-contiguous
> > data. If they do not use datatypes then they are
> > doing the equivalent of either the packing/unpacking
> > or smaller messages at the user level. This s hould
> > be discouraged, not encouraged. A small savings
> > in library object size is not ample reason to go
> > against that. And, yes, we are after encouraging
> > hardware vendors to provide the right hardware.
> >
> > Bronis
> >
> >
> > On Fri, 29 Feb 2008, Supalov, Alexander wrote:
> >
> > > Hi,
> > >
> > > Thanks. I think the main thrust here is the library footprint (no
> > > pack/unpack, etc.) and complexity of the user side of the datatype
> > > interface, rather than performance. Many applications just don't
> need
> > > any of this, and never will. Why not translating this application
> > > non-requirement into a minimum MPI subset? Same with
> > communicator/group
> > > management, etc.
> > >
> > > Moreover, homogeneous installations that dominate HPC now don't
> > actually
> > > need any datatype support at all. They send chunks of bytes. This
> may
> > > change in the future, though.
> > >
> > > A minor performance implication is that without holes that are
only
> > > possible with derived datatypes, one does not need to track this,
> > split
> > > the critical path, and make special provisions inside the MPI
device
> > > layer to handle iov or such.
> > >
> > > The NIC capability argument is interesting, but it turns the
> > discussion
> > > on its head: we're not after motivating network vendors to provide
> > > scatter/gather in hardware here, are we? Please clarify.
> > >
> > > Best regards.
> > >
> > > Alexander
> > >
> > > -----Original Message-----
> > > From: mpi3-subsetting-bounces_at_[hidden]
> > > [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of
> > Bronis
> > > R. de Supinski
> > > Sent: Friday, February 29, 2008 5:53 AM
> > > To: mpi3-subsetting_at_[hidden]
> > > Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff
telecon
> > > ww09
> > >
> > >
> > > All:
> > >
> > > OK, I have to respond to the notion that derived datatypes
> > > limit performance. It is just not a reasonable position.
> > >
> > > Sure, if you can send contiguous locations, you will get
> > > higher performance. The problem is that codes do not only
> > > need to send contiguous data so that is not an adequate
> > > reason to say derived datatypes limit performance.
> > >
> > > So, what is left? That there is some more efficient way
> > > to send non-contiguous data? How? As multiple messages,
> > > each of which send contiguous data? If so, then the
> > > implementation could do this under the covers and the
> > > datatypes are just a convenience for the user not to
> > > have to specify the individual sends. OK, suppose that's
> > > not the reason. Perhaps the user can do the copying into
> > > a contiguous buffer and get better performance? While
> > > I have seen this hold with some implementations, it is
> > > absurd. There is no reason that I can discern as to why
> > > the user should be able to deduce a better copying
> > > mechanism than the MPI implementer. So, again, at worst,
> > > the datatypes should be a convenience. Do you have an
> > > alternative reason or a refutation of these opinions?
> > >
> > > What is more important, it is certainly possible to build
> > > scatter/gather support into a NIC and achieve better
> > > performance with datatypes than without. While there are
> > > issues to be resolved for that (primarily the issue of
> > > pinning memory), they are solvable with the right hardware
> > > mechanism. Just because it does not yet exist is not
> > > an adequate reason to say "Get rid of datatypes". OK,
> > > you are not saying that but you are saying to deprecate
> > > them in a sense. And saying you could send contiguous
> > > sends more efficiently is a bad argument here. How do
> > > datatypes cause inefficiency for that? How much is
> > > that cost really? At what point do you hit where the
> > > answer is "It would be faster not to compute anything"?
> > >
> > > Bronis
> > >
> > >
> > > On Fri, 29 Feb 2008, Supalov, Alexander wrote:
> > >
> > > > Hi,
> > > >
> > > > Thanks. What subsets inside the current standard would you
> propose?
> > > What
> > > > interfaces between them would you envision?
> > > >
> > > > Good idea about the optimization opportunities. Here's an
initial
> > > > combined list, with the main benefits as I see them. Please
> > > > comment/extend.
> > > >
> > > > - Dynamic process support: less overhead in the progress engine,
> > > easier
> > > > global rank handling.
> > > > - Heterogeneity: better memory footprint, easier data handling.
> > > > - Derived datatypes (especially those with holes): better memory
> > > > footprint.
> > > > - MPI_ANY_SOURCE: faster, more simple multifabric progress.
> > > > - File I/O: smaller requests, easier wait/test functions.
> > > > - One-sided ops: no passive target w/o MPI calls - no extra
> progress
> > > > thread.
> > > > - Communicator & group management: better memory footprint.
> > > > - Message tagging: better support for stable dataflow exchanges,
> > > smaller
> > > > packets.
> > > > - Non-blocking communication: easier ordering, simplified
request
> > > > handling.
> > > >
> > > > Best regards.
> > > >
> > > > Alexander
> > > >
> > > > -----Original Message-----
> > > > From: mpi3-subsetting-bounces_at_[hidden]
> > > > [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf
Of
> > > > Torsten Hoefler
> > > > Sent: Friday, February 29, 2008 5:08 AM
> > > > To: mpi3-subsetting_at_[hidden]
> > > > Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff
> telecon
> > > > ww09
> > > >
> > > > Hi,
> > > > >    Present: Leonid Meyerguz (Microsoft), Rich Graham (ORNL),
> > Richard
> > > > Barrett
> > > > >    (ORNL), Torsten Hoefler (ISU), Alexander Supalov (Intel)
> > > > just for the record, it's "IU" not "ISU" :-)
> > > >
> > > > >    - Scope of the effort
> > > > >      - Rich
> > > > >        - Minimum subset consistent with the rest of MPI, for
> > > > >    performance/memory footprint optimization
> > > > >        - Danger of splitting MPI, hence against optional
> features
> > in
> > > > the
> > > > >    standard
> > > > I back that (danger of optional features for portability). I'd
> > propose
> > > > to split the current standard into mostly self-contained subsets
> > that
> > > > have clearly defined interfaces to the rest of the standard.
Note:
> > > this
> > > > only defines logical interfaces, that does *not* define how
those
> > > things
> > > > are to be implemented. This makes it easier to understand the
> > standard
> > > > and have separate (portable) libraries for the subsets, it does
> not
> > > > influence optimization possibilities by implementing everything
in
> a
> > > > monolithic block (i.e., central progress).
> > > >
> > > > >        - Both blocking & nonblocking belong to the core
> > > > >      - Torsten
> > > > >        - Some collectives may go into selectable subsets
> > > > I see three subsets: blocking colls, non-blocking colls and
> > > topological
> > > > colls (maybe also blocking / non-blocking).
> > > >
> > > > >        - MPI_ANY_SOURCE considered harmful
> > > > I'd like to add datatypes and heterogeneity to this list (with
> > regards
> > > > to performance). Alexander mentioned the dynamics. I think we
> should
> > > > have a lit of items ready that could influence optimization
> > > > possibilities significanty if they were to be announced by the
> user
> > > > before he can use them. That would give another strong argument
> for
> > > the
> > > > subsetting.
> > > >
> > > > Best,
> > > >   Torsten
> > > >
> > > > --
> > > >  bash$ :(){ :|:&};: --------------------- http://www.unixer.de/
> > -----
> > > > Indiana University    | http://www.indiana.edu
> > > > Open Systems Lab      | http://osl.iu.edu/
> > > > 150 S. Woodlawn Ave.  | Bloomington, IN, 474045-7104 | USA
> > > > Lindley Hall Room 135 | +01 (812) 855-3608
> > > > _______________________________________________
> > > > Mpi3-subsetting mailing list
> > > > Mpi3-subsetting_at_[hidden]
> > > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
> > > >
> >
---------------------------------------------------------------------
> > > > Intel GmbH
> > > > Dornacher Strasse 1
> > > > 85622 Feldkirchen/Muenchen Germany
> > > > Sitz der Gesellschaft: Feldkirchen bei Muenchen
> > > > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes
> Schwaderer
> > > > Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> > > > VAT Registration No.: DE129385895
> > > > Citibank Frankfurt (BLZ 502 109 00) 600119052
> > > >
> > > > This e-mail and any attachments may contain confidential
material
> > for
> > > > the sole use of the intended recipient(s). Any review or
> > distribution
> > > > by others is strictly prohibited. If you are not the intended
> > > > recipient, please contact the sender and delete all copies.
> > > >
> > > >
> > > > _______________________________________________
> > > > Mpi3-subsetting mailing list
> > > > Mpi3-subsetting_at_[hidden]
> > > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
> > > >
> > > _______________________________________________
> > > Mpi3-subsetting mailing list
> > > Mpi3-subsetting_at_[hidden]
> > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
> > >
> ---------------------------------------------------------------------
> > > Intel GmbH
> > > Dornacher Strasse 1
> > > 85622 Feldkirchen/Muenchen Germany
> > > Sitz der Gesellschaft: Feldkirchen bei Muenchen
> > > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes
Schwaderer
> > > Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> > > VAT Registration No.: DE129385895
> > > Citibank Frankfurt (BLZ 502 109 00) 600119052
> > >
> > > This e-mail and any attachments may contain confidential material
> for
> > > the sole use of the intended recipient(s). Any review or
> distribution
> > > by others is strictly prohibited. If you are not the intended
> > > recipient, please contact the sender and delete all copies.
> > >
> > >
> >
---------------------------------------------------------------------
> > Intel GmbH
> > Dornacher Strasse 1
> > 85622 Feldkirchen/Muenchen Germany
> > Sitz der Gesellschaft: Feldkirchen bei Muenchen
> > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> > Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> > VAT Registration No.: DE129385895
> > Citibank Frankfurt (BLZ 502 109 00) 600119052
> >
> > This e-mail and any attachments may contain confidential material
for
> > the sole use of the intended recipient(s). Any review or
distribution
> > by others is strictly prohibited. If you are not the intended
> > recipient, please contact the sender and delete all copies.
> >
> >
> ---------------------------------------------------------------------
> Intel GmbH
> Dornacher Strasse 1
> 85622 Feldkirchen/Muenchen Germany
> Sitz der Gesellschaft: Feldkirchen bei Muenchen
> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> VAT Registration No.: DE129385895
> Citibank Frankfurt (BLZ 502 109 00) 600119052
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
>
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


From rbarrett at [hidden]  Fri Feb 29 07:50:03 2008
From: rbarrett at [hidden] (Richard Barrett)
Date: Fri, 29 Feb 2008 08:50:03 -0500
Subject: [Mpi3-subsetting] Some "stupid user" questions, comments.
In-Reply-To: <mailman.2717.1204221890.10290.mpi3-subsetting@lists.mpi-forum.org>
Message-ID: <C3ED77BB.7D39%rbarrett@ornl.gov>


Hi folks,

I'm still sorting things out in my mind, so perhaps this note is just me
talking to myself. But should you feel so compelled to sort through it, I
would appreciate any feedback you might offer; and it will make me a more
informed participant.

I see two main perspectives: the user and the implementer. I come from the
user side, so I feel comfortable in positing that user confusion over the
size of the standard is really a function of presentation. That is, most of
us get our information regarding using MPI directly from the standard. For
me, this is the _only_ standard I've ever actually read! Perhaps I am
missing out on thousands of C and Fortran capabilities, but sometimes
ignorance is bliss. That speaks highly to the MPI specification
presentation; however it need not be the case. An easy solution to the "too
many routines" complaint is a tutorial/book/chapter on the basics, with
pointers to further information. And in fact these books exist. That said, I
hope that MPI-3 deprecates a meaningful volume of functionality.

>From the implementer perspective, there appear to be two goals. First is to
ease the burden with regard to the amount of functionality that must be
supported. (And we users don't want to hear of your whining, esp. from a
company the size of Intel :) Second, which overlaps with user concerns, is
performance. That is, by defining a small subset of functionality, strong
performance (in some sense, e.g. speed or memory requirements) can be
realized.

At the risk of starting too detailed a discussion at this early point (as
well as exposing my ignorance:), I will throw out a few situations for
discussion.

1. What would such a subset would imply with regard to what I view as
support functionality, such as user-defined datatypes, topologies, etc? Ie
could this support be easily provided, say by cutting-and-pasting from the
full implementation you will still provide? (I now see Torsten recommends
excluding datatypes, but what of other stuff?)
2. Even more broadly (and perhaps very ignorantly), can I simply link in
both libraries, like -lmpi_subset -lmpi, getting the good stuff from the
former and the excluded functionality from the latter? In addition to the
application developers use of MPI, all large application programs I&#185;ve dealt
with make some use of externally produced libraries (a &#179;very good thing&#178;
imo), which probably exceed the functionality in a &#179;subset&#178; implementation.
3. I (basically) understand the adverse performance effects of allowing
promiscuous receives (MPI_ANY_SOURCE). However, this is a powerful
capability for many codes, and used only in moderation, eg for setting up
communication requirements (such as communication partners in unstructured,
semi-structured, and dynamic mesh computations). In this case the sender
knows its partner, but the receiver does not. A reduction(sum) is used to
let each process know the number of communication partners from which it
will receive data, the process posts that many promiscuous receives, which
when satisfied lets it from then on specify the sender. So would it be
possible to include this capability in a separate function, say the blocking
send/recv, but not allow it in the non-blocking version?
4. Collectives: I can't name a code I've ever worked with that doesn't
require MPI_Allreduce (though I wouldn&#185;t be surprised to hear of many), and
this in a broad set of science areas. MPI_Bcast is also often used (but
quite often only in the setup phase). I see MPI_Reduce used most often to
collect timing information, so MPI_Allreduce would probably be fine as well.
MPI_Gather is often quite useful, as is MPI_Scatter, but again often in
setup. (Though often &#179;setup&#178; occurs once per time step.) Non-constant size
versions are often used. And others can also no doubt offer strong opinions
regarding inclusion of exclusion. But from an implementation perspective,
what are the issues? In particular, is the basic infrastructure for these
(and other collective operations) the same? A driving premise for supporting
collectives is that the sort of performance driven capability under
discussion is most needed by applications running at very large scale, which
is where even very good collect implementations run into problems.
5. Language bindings and perhaps other things: With the expectation/hope
that full implementations continue to be available, I could use them for
code development, thus making use of things like type checking, etc. And
does this latter use then imply the need for "stubs" for things like the
(vaporous) Fortran bindings module, communicators (if only MPI_COMM_WORLD is
supported), etc.? And presuming the answer to #2 is &#179;no&#178;, could/should the
full implementation &#179;warn&#178; me (preferably at compile time) when I&#185;m using
functionality that rules out use of the subset?
6. Will the profile layer still be supported? Generating usage can still be
quantified using a full implementation, but performance would not be (at
least in this manner), which would rule out an apples-to-apples comparison
between a full implementation and the subset version with its advertised
superior performance. (Of course an overall runtime could be compared, which
is the final word, but a more detailed analysis is often preferred.)
7. If blocking and non-blocking are required of the subset, aren't these
blocking semantics?

    MPI_Send: MPI_Isend ( ..., &req ); MPI_Wait ( ..., &req );
    -----
    MPI_Recv: MPI_Irecv ( ..., &req ); MPI_Wait ( &req );

        - And speaking of this, are there performance issues associated with
variants of MPI_Wait, eg MPI_Waitany, MPI_Waitsome?

Finally, I&#185;ll officially register my concern with what I see as an
increasing complexity in this effort, esp wrt &#179;multiple subsets&#178;. I don&#185;t
intend this comment to suppress ideas, but to help keep the beating the drum
for simplicity, which I see as a key goal of this effort.

If you read this far, thanks! My apologies if some of these issues have been
previously covered. And if I've simply exposed myself as ignorant, I feel
confident is stating that I am not alone - these questions will persist from
others. :)

Richard

-- 
  Richard Barrett
  Future Technologies Group, Computer Science and Mathematics Division, and
  Scientific Computing Group, National Center for Computational Science
  Oak Ridge National Laboratory
  http://ft.ornl.gov/~rbarrett
On 2/28/08 1:04 PM, "mpi3-subsetting-request_at_[hidden]"
<mpi3-subsetting-request_at_[hidden]> wrote:
> Thank you for your time today. It was a very good discussion. Here's
> what I captured (please add/modify what I may have missed):
>  
> Present: Leonid Meyerguz (Microsoft), Rich Graham (ORNL), Richard
> Barrett (ORNL), Torsten Hoefler (ISU), Alexander Supalov (Intel)
>  
> - Opens & introductions
>  
> - Scope of the effort
>   - Rich
>     - Minimum subset consistent with the rest of MPI, for
> performance/memory footprint optimization
>     - Danger of splitting MPI, hence against optional features in the
> standard
>     - Both blocking & nonblocking belong to the core
>   - Torsten
>     - Some collectives may go into selectable subsets
>     - MPI_ANY_SOURCE considered harmful
>   - Leonid
>     - Flexible support for optional features, means for choosing and
> advertising level of compliance/set of features
>   - See enclosed email for Alexander's POV
>  
> - General discussion snapshots
>   - Support of subsets: some or all? If some, possible linkage problems
> in static apps (or dead calls). If all, where's the gain?
>   - Optional: really optional (may be not present) or selectable (are
> present but may be unused)?
>   - Performance penalty for unused subsets: implementation matter or
> standard choice?
>   - Portability may be limited to certain class of applications (think
> FT, master-slave runs)
>   - All we design needs to be implementable, complexity needs to be
> controlled
>   - An ability to use certain set of subsets should not preclude pulling
> in other modules if necessary
>   - Whatever we do, it should not conflict with the ABI efforts
>   - Need to stay nice and be nicer wrt to the libraries (think
> threading) and keep things simple
>   - The simplification argument, if put first, may not be liked by some
>  
> - Next steps
>   - Please comment on these minutes, and add/modify what I may have
> missed
>   - I'll prepare a couple of slides by next week summarizing our
> discussion so far; again, your feedback will be most welcome
>   - At the meeting, it may be great to meet F2F briefly and discuss any
> eventual loose ends before the presentation at the Forum; I'll see to
> this
>  
> Best regards.
>  
> Alexander
>  
> --
> Dr Alexander Supalov
> Intel GmbH
> Hermuelheimer Strasse 8a
> 50321 Bruehl, Germany
> Phone:          +49 2232 209034
> Mobile:          +49 173 511 8735
> Fax:              +49 2232 209029
> ---------------------------------------------------------------------
> Intel GmbH
> Dornacher Strasse 1
> 85622 Feldkirchen/Muenchen Germany
> Sitz der Gesellschaft: Feldkirchen bei Muenchen
> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> VAT Registration No.: DE129385895
> Citibank Frankfurt (BLZ 502 109 00) 600119052
> 
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
> -------------- next part --------------
> HTML attachment scrubbed and removed
> -------------- next part --------------
> An embedded message was scrubbed...
> From: "Supalov, Alexander" <alexander.supalov_at_[hidden]>
> Subject: Subsetting scope: a POV
> Date: Tue, 26 Feb 2008 11:10:15 -0000
> Size: 17674
> Url: 
> http://lists.mpi-forum.org/MailArchives/mpi3-subsetting/attachments/20080228/6
> 73bb604/attachment.mht
> 
> ------------------------------
> 
> _______________________________________________
> Mpi3-subsetting mailing list
> Mpi3-subsetting_at_[hidden]
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
> 
> 
> End of Mpi3-subsetting Digest, Vol 1, Issue 5
> *********************************************


* 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpi3-subsetting/attachments/20080229/fded04d9/attachment.html>

From alexander.supalov at [hidden]  Fri Feb 29 08:26:10 2008
From: alexander.supalov at [hidden] (Supalov, Alexander)
Date: Fri, 29 Feb 2008 14:26:10 -0000
Subject: [Mpi3-subsetting] Some "stupid user" questions, comments.
In-Reply-To: <C3ED77BB.7D39%rbarrett@ornl.gov>
Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A201197401@swsmsx413.ger.corp.intel.com>


Dear RIchard,
 
Thanks. The more complicated the standard gets, the happier are the
implementors. However, now we try to think like MPI users for a change,
so, thanks for providing a reality check.
 
Now, to one of your questions. An MPI_ANY_SOURCE MPI_Recv in multifabric
environment means that a receive has to be posted somehow to more than
one fabric in the MPI device layer. Once one of them gets the message,
the posted receives should be cancelled on other fabrics. Now, what if
they've already matched and started to receive something? What if they
cannot cancel a posted receive? And so on. There are 3 to 5 ways to deal
with this situation, with and without actually posting a receive, but
none of them is good enough if you ask me. That's why there are 3 to 5
of them, actually. And all of them complicate the progress engine - the
heart of an MPI implementation - at exactly the spot where one wants
things simple and fast.
 
This means that most of the time we fight these repercussions and curse
the MPI_ANY_SOURCE. Or, looping back to the beginning of this message,
we actually never stop blessing MPI_ANY_SOURCE. Fighting this kind of
trouble is what we are paid for. ;)
 
Best regards.
 
Alexander

________________________________

From: mpi3-subsetting-bounces_at_[hidden]
[mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of
Richard Barrett
Sent: Friday, February 29, 2008 2:50 PM
To: mpi3-subsetting_at_[hidden]
Subject: [Mpi3-subsetting] Some "stupid user" questions, comments.

Hi folks,

I'm still sorting things out in my mind, so perhaps this note is just me
talking to myself. But should you feel so compelled to sort through it,
I would appreciate any feedback you might offer; and it will make me a
more informed participant. 

I see two main perspectives: the user and the implementer. I come from
the user side, so I feel comfortable in positing that user confusion
over the size of the standard is really a function of presentation. That
is, most of us get our information regarding using MPI directly from the
standard. For me, this is the _only_ standard I've ever actually read!
Perhaps I am missing out on thousands of C and Fortran capabilities, but
sometimes ignorance is bliss. That speaks highly to the MPI
specification presentation; however it need not be the case. An easy
solution to the "too many routines" complaint is a tutorial/book/chapter
on the basics, with pointers to further information. And in fact these
books exist. That said, I hope that MPI-3 deprecates a meaningful volume
of functionality.

>From the implementer perspective, there appear to be two goals. First
is to ease the burden with regard to the amount of functionality that
must be supported. (And we users don't want to hear of your whining,
esp. from a company the size of Intel :) Second, which overlaps with
user concerns, is performance. That is, by defining a small subset of
functionality, strong performance (in some sense, e.g. speed or memory
requirements) can be realized.

At the risk of starting too detailed a discussion at this early point
(as well as exposing my ignorance:), I will throw out a few situations
for discussion.

1.	What would such a subset would imply with regard to what I view
as support functionality, such as user-defined datatypes, topologies,
etc? Ie could this support be easily provided, say by
cutting-and-pasting from the full implementation you will still provide?
(I now see Torsten recommends excluding datatypes, but what of other
stuff?) 
2.	Even more broadly (and perhaps very ignorantly), can I simply
link in both libraries, like -lmpi_subset -lmpi, getting the good stuff
from the former and the excluded functionality from the latter? In
addition to the application developers use of MPI, all large application
programs I've dealt with make some use of externally produced libraries
(a "very good thing" imo), which probably exceed the functionality in a
"subset" implementation. 
3.	I (basically) understand the adverse performance effects of
allowing promiscuous receives (MPI_ANY_SOURCE). However, this is a
powerful capability for many codes, and used only in moderation, eg for
setting up communication requirements (such as communication partners in
unstructured, semi-structured, and dynamic mesh computations). In this
case the sender knows its partner, but the receiver does not. A
reduction(sum) is used to let each process know the number of
communication partners from which it will receive data, the process
posts that many promiscuous receives, which when satisfied lets it from
then on specify the sender. So would it be possible to include this
capability in a separate function, say the blocking send/recv, but not
allow it in the non-blocking version? 
4.	Collectives: I can't name a code I've ever worked with that
doesn't require MPI_Allreduce (though I wouldn't be surprised to hear of
many), and this in a broad set of science areas. MPI_Bcast is also often
used (but quite often only in the setup phase). I see MPI_Reduce used
most often to collect timing information, so MPI_Allreduce would
probably be fine as well. MPI_Gather is often quite useful, as is
MPI_Scatter, but again often in setup. (Though often "setup" occurs once
per time step.) Non-constant size versions are often used. And others
can also no doubt offer strong opinions regarding inclusion of
exclusion. But from an implementation perspective, what are the issues?
In particular, is the basic infrastructure for these (and other
collective operations) the same? A driving premise for supporting
collectives is that the sort of performance driven capability under
discussion is most needed by applications running at very large scale,
which is where even very good collect implementations run into problems.

5.	Language bindings and perhaps other things: With the
expectation/hope that full implementations continue to be available, I
could use them for code development, thus making use of things like type
checking, etc. And does this latter use then imply the need for "stubs"
for things like the (vaporous) Fortran bindings module, communicators
(if only MPI_COMM_WORLD is supported), etc.? And presuming the answer to
#2 is "no", could/should the full implementation "warn" me (preferably
at compile time) when I'm using functionality that rules out use of the
subset? 
6.	Will the profile layer still be supported? Generating usage can
still be quantified using a full implementation, but performance would
not be (at least in this manner), which would rule out an
apples-to-apples comparison between a full implementation and the subset
version with its advertised superior performance. (Of course an overall
runtime could be compared, which is the final word, but a more detailed
analysis is often preferred.) 
7.	If blocking and non-blocking are required of the subset, aren't
these blocking semantics?
        

    MPI_Send: MPI_Isend ( ..., &req ); MPI_Wait ( ..., &req );
    -----
    MPI_Recv: MPI_Irecv ( ..., &req ); MPI_Wait ( &req );

        - And speaking of this, are there performance issues associated
with variants of MPI_Wait, eg MPI_Waitany, MPI_Waitsome? 

Finally, I'll officially register my concern with what I see as an
increasing complexity in this effort, esp wrt "multiple subsets". I
don't intend this comment to suppress ideas, but to help keep the
beating the drum for simplicity, which I see as a key goal of this
effort. 

If you read this far, thanks! My apologies if some of these issues have
been previously covered. And if I've simply exposed myself as ignorant,
I feel confident is stating that I am not alone - these questions will
persist from others. :)

Richard

-- 
  Richard Barrett
  Future Technologies Group, Computer Science and Mathematics Division,
and
  Scientific Computing Group, National Center for Computational Science
  Oak Ridge National Laboratory
  http://ft.ornl.gov/~rbarrett
On 2/28/08 1:04 PM, "mpi3-subsetting-request_at_[hidden]"
<mpi3-subsetting-request_at_[hidden]> wrote:
> Thank you for your time today. It was a very good discussion. Here's
> what I captured (please add/modify what I may have missed):
>  
> Present: Leonid Meyerguz (Microsoft), Rich Graham (ORNL), Richard
> Barrett (ORNL), Torsten Hoefler (ISU), Alexander Supalov (Intel)
>  
> - Opens & introductions 
>  
> - Scope of the effort 
>   - Rich
>     - Minimum subset consistent with the rest of MPI, for
> performance/memory footprint optimization
>     - Danger of splitting MPI, hence against optional features in the
> standard
>     - Both blocking & nonblocking belong to the core
>   - Torsten
>     - Some collectives may go into selectable subsets
>     - MPI_ANY_SOURCE considered harmful
>   - Leonid
>     - Flexible support for optional features, means for choosing and
> advertising level of compliance/set of features
>   - See enclosed email for Alexander's POV
>  
> - General discussion snapshots
>   - Support of subsets: some or all? If some, possible linkage
problems
> in static apps (or dead calls). If all, where's the gain?
>   - Optional: really optional (may be not present) or selectable (are
> present but may be unused)?
>   - Performance penalty for unused subsets: implementation matter or
> standard choice?
>   - Portability may be limited to certain class of applications (think
> FT, master-slave runs)
>   - All we design needs to be implementable, complexity needs to be
> controlled
>   - An ability to use certain set of subsets should not preclude
pulling
> in other modules if necessary
>   - Whatever we do, it should not conflict with the ABI efforts
>   - Need to stay nice and be nicer wrt to the libraries (think
> threading) and keep things simple
>   - The simplification argument, if put first, may not be liked by
some
>  
> - Next steps
>   - Please comment on these minutes, and add/modify what I may have
> missed
>   - I'll prepare a couple of slides by next week summarizing our
> discussion so far; again, your feedback will be most welcome
>   - At the meeting, it may be great to meet F2F briefly and discuss
any
> eventual loose ends before the presentation at the Forum; I'll see to
> this
>  
> Best regards.
>  
> Alexander
>  
> --
> Dr Alexander Supalov
> Intel GmbH
> Hermuelheimer Strasse 8a
> 50321 Bruehl, Germany
> Phone:          +49 2232 209034
> Mobile:          +49 173 511 8735
> Fax:              +49 2232 209029
> ---------------------------------------------------------------------
> Intel GmbH
> Dornacher Strasse 1
> 85622 Feldkirchen/Muenchen Germany
> Sitz der Gesellschaft: Feldkirchen bei Muenchen
> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> VAT Registration No.: DE129385895
> Citibank Frankfurt (BLZ 502 109 00) 600119052
> 
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
> -------------- next part --------------
> HTML attachment scrubbed and removed
> -------------- next part --------------
> An embedded message was scrubbed...
> From: "Supalov, Alexander" <alexander.supalov_at_[hidden]>
> Subject: Subsetting scope: a POV
> Date: Tue, 26 Feb 2008 11:10:15 -0000
> Size: 17674
> Url: 
>
http://lists.mpi-forum.org/MailArchives/mpi3-subsetting/attachments/2008
0228/6
> 73bb604/attachment.mht 
> 
> ------------------------------
> 
> _______________________________________________
> Mpi3-subsetting mailing list
> Mpi3-subsetting_at_[hidden]
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
> 
> 
> End of Mpi3-subsetting Digest, Vol 1, Issue 5
> *********************************************
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


* 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpi3-subsetting/attachments/20080229/296ad646/attachment.html>

From htor at [hidden]  Fri Feb 29 08:57:26 2008
From: htor at [hidden] (Torsten Hoefler)
Date: Fri, 29 Feb 2008 09:57:26 -0500
Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09
In-Reply-To: <5ECAB1304A8B5B4CB3F9D6C01E4E21A201196FE2@swsmsx413.ger.corp.intel.com>
Message-ID: <20080229145726.GJ16623@benten.cs.indiana.edu>


Hi,
> Thanks. As soon as there's a couple of non-blocking recvs out there,
> waiting for them in reverse order requires tracking of the moment when
> the receives were posted. In some cases this leads to extra fields and
> data exchanges.
How's that different from multiple Recvs in multiple threads. Hmm, I
guess the threaded case is just undefined and the implementation is
allowed to ignore ordering, right? 

> The footprint argument generally says that the library will be smaller.
> This may be a minor matter for general purpose computers, but as soon as
> you go to Petascale, you need every byte on the compute nodes for user
> data, especially if dynamic libraries are not supported.
ack

> As for the collectives, many are implemented using SendRecv, and that
> blocking call in turn often uses non-blocking communication. Classic
> Alltoallv algorithm uses nonblocking calls, too. So, I'm not sure that
> even unoptimized blocking collectives will always use only blocking
> pt2pt.
Yes, of course - again, the proposed "interface" is more of a logical
nature. You can implement all blocking collectives with blocking p2p
(it'll be slower). But ok, pulling all p2p functions in this interface
is also unproblematic. So I don't really have a srong opinion here.

Best,
  Torsten


-- 
 bash$ :(){ :|:&};: --------------------- http://www.unixer.de/ -----
Indiana University    | http://www.indiana.edu
Open Systems Lab      | http://osl.iu.edu/
150 S. Woodlawn Ave.  | Bloomington, IN, 474045-7104 | USA
Lindley Hall Room 135 | +01 (812) 855-3608


From htor at [hidden]  Fri Feb 29 09:02:45 2008
From: htor at [hidden] (Torsten Hoefler)
Date: Fri, 29 Feb 2008 10:02:45 -0500
Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09
In-Reply-To: <Pine.LNX.4.58.0802282042110.3046@tux213.llnl.gov>
Message-ID: <20080229150245.GK16623@benten.cs.indiana.edu>


Bronis,
for the record: I do *not* advocate to get rid of datatypes! I think
datatypes are a great thing for some parallel applications and they
certainly should be used as a high-level abstraction. I've implemented
scatter/gather list-based optimizations for modern NICs (IB). 

But on the other hand, there are many codes out there that do just not
use datatypes. Codes that are only supposed to run in heterogeneous
environments. Codes that use sockets instead of MPI. If we want to aim
at this market, we need to simplify here. A simplification could be to
use MPI_BYTE by default ;) but it would be better to get rid of the code
and control-path overhead. 

Just to clarify my opinion,
  Torsten


-- 
 bash$ :(){ :|:&};: --------------------- http://www.unixer.de/ -----
Indiana University    | http://www.indiana.edu
Open Systems Lab      | http://osl.iu.edu/
150 S. Woodlawn Ave.  | Bloomington, IN, 474045-7104 | USA
Lindley Hall Room 135 | +01 (812) 855-3608


From rlgraham at [hidden]  Fri Feb 29 09:08:22 2008
From: rlgraham at [hidden] (Richard Graham)
Date: Fri, 29 Feb 2008 10:08:22 -0500
Subject: [Mpi3-subsetting] Where is archive?
In-Reply-To: <OF395BE705.855A7B1D-ON852573FE.004EFEAB-852573FE.004F6CF5@us.ibm.com>
Message-ID: <C3ED8A16.17776%rlgraham@ornl.gov>


The mailing lists at uiuc are no longer active, and at this stage just
forward mail
 to lists.mpi-forum.org .  This too will be turned off in about 2 weeks.

Each working group has wiki space for such things, some use if more than
others.
 This wg just started its work yesterday, so very little has been done, and
we
 are at the stage of trying to define what we mean by subsetting.

The wiki pages can be accessed from the meetings web page,
meetings.mpi-forum.org,
 by following the MPI 3.0 link, and then going to what ever working group
you are
 interested in.  I have not looked at the subsetting wiki site, to see if
anything has been
 put up on it yet.

Rich

On 2/29/08 9:27 AM, "Richard Treumann" <treumann_at_[hidden]> wrote:

> FYI - the mailing list web page: http://lists.cs.uiuc.edu/mailman/listinfo has
> links to most or all of the email lists I know of except this one.
> 
> Is there an archive?
> 
> Also - is there an overview proposal somewhere?
> 
>  Thanks
> 
> Dick Treumann  -  MPI Team/TCEM
> IBM Systems & Technology Group
> Dept 0lva / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
> Tele (845) 433-7846         Fax (845) 433-8363
> 


* 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpi3-subsetting/attachments/20080229/87465d4d/attachment.html>

From alexander.supalov at [hidden]  Fri Feb 29 09:17:36 2008
From: alexander.supalov at [hidden] (Supalov, Alexander)
Date: Fri, 29 Feb 2008 15:17:36 -0000
Subject: [Mpi3-subsetting] Where is archive?
In-Reply-To: <C3ED8A16.17776%rlgraham@ornl.gov>
Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A201197473@swsmsx413.ger.corp.intel.com>


Hi,
 
Our WG is so young that we have not put anything up yet. There must be
quite a few emails in the archive by now, however, including minutes of
yesterday's meeting. Summary slides capturing the state of discussion so
far will follow next week.
 
Best regards.
 
Alexander

________________________________

From: mpi3-subsetting-bounces_at_[hidden]
[mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of
Richard Graham
Sent: Friday, February 29, 2008 4:08 PM
To: Richard Treumann
Cc: mpi3-subsetting_at_[hidden]
Subject: Re: [Mpi3-subsetting] Where is archive?

The mailing lists at uiuc are no longer active, and at this stage just
forward mail
 to lists.mpi-forum.org .  This too will be turned off in about 2 weeks.

Each working group has wiki space for such things, some use if more than
others.
 This wg just started its work yesterday, so very little has been done,
and we
 are at the stage of trying to define what we mean by subsetting.

The wiki pages can be accessed from the meetings web page,
meetings.mpi-forum.org,
 by following the MPI 3.0 link, and then going to what ever working
group you are
 interested in.  I have not looked at the subsetting wiki site, to see
if anything has been
 put up on it yet.

Rich

On 2/29/08 9:27 AM, "Richard Treumann" <treumann_at_[hidden]> wrote:

        FYI - the mailing list web page:
http://lists.cs.uiuc.edu/mailman/listinfo has links to most or all of
the email lists I know of except this one.  
        
        Is there an archive?
        
        Also - is there an overview proposal somewhere?
        
         Thanks
        
        Dick Treumann  -  MPI Team/TCEM            
        IBM Systems & Technology Group
        Dept 0lva / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
        Tele (845) 433-7846         Fax (845) 433-8363
        
        
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


* 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpi3-subsetting/attachments/20080229/11865934/attachment.html>

From rlgraham at [hidden]  Fri Feb 29 09:19:56 2008
From: rlgraham at [hidden] (Richard Graham)
Date: Fri, 29 Feb 2008 10:19:56 -0500
Subject: [Mpi3-subsetting] Some "stupid user" questions, comments.
In-Reply-To: <5ECAB1304A8B5B4CB3F9D6C01E4E21A201197401@swsmsx413.ger.corp.intel.com>
Message-ID: <C3ED8CCC.1777A%rlgraham@ornl.gov>


On 2/29/08 9:26 AM, "Supalov, Alexander" <alexander.supalov_at_[hidden]>
wrote:

> Dear RIchard,
>  
> Thanks. The more complicated the standard gets, the happier are the
> implementors. However, now we try to think like MPI users for a change, so,
> thanks for providing a reality check.
> 
>>> >> Quite to the contrary.  The simpler the standard is the easier to support
>>> &#173; complexity is not a good thing at all.
>>> >> This is my view as an implementer.  Complexity is often introduced when
>>> trying to get good performance out of
>>> >> a spec that supports a wide variety of options.
>  
> Now, to one of your questions. An MPI_ANY_SOURCE MPI_Recv in multifabric
> environment means that a receive has to be posted somehow to more than one
> fabric in the MPI device layer. Once one of them gets the message, the posted
> receives should be cancelled on other fabrics. Now, what if they've already
> matched and started to receive something? What if they cannot cancel a posted
> receive? And so on. There are 3 to 5 ways to deal with this situation, with
> and without actually posting a receive, but none of them is good enough if you
> ask me. That's why there are 3 to 5 of them, actually. And all of them
> complicate the progress engine - the heart of an MPI implementation - at
> exactly the spot where one wants things simple and fast.
> 
>>> >> The any_source and multiple fabrics are two distinct issues.  Even if you
>>> do not support any_source and have
>>> >> multiple fabrics, you have the issue that to support mpi ordering
>>> semantics, matching needs to be done
>>> >> in the context of all the nics &#173; unless you decide to have only one nic
>>> do the matching, including any on-host
>>> >> traffic.  What any_source forces is matching on the receive side &#173; unless
>>> one wants to set up a very complex
>>> >> and inefficient way to make sure that only one receive is matched for
>>> each wild card receive.
> 
> Rich
>  
> This means that most of the time we fight these repercussions and curse the
> MPI_ANY_SOURCE. Or, looping back to the beginning of this message, we actually
> never stop blessing MPI_ANY_SOURCE. Fighting this kind of trouble is what we
> are paid for. ;)
>  
> Best regards.
>  
> Alexander
> 
> 
> From: mpi3-subsetting-bounces_at_[hidden]
> [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of Richard
> Barrett
> Sent: Friday, February 29, 2008 2:50 PM
> To: mpi3-subsetting_at_[hidden]
> Subject: [Mpi3-subsetting] Some "stupid user" questions, comments.
> 
> Hi folks,
> 
> I'm still sorting things out in my mind, so perhaps this note is just me
> talking to myself. But should you feel so compelled to sort through it, I
> would appreciate any feedback you might offer; and it will make me a more
> informed participant.
> 
> I see two main perspectives: the user and the implementer. I come from the
> user side, so I feel comfortable in positing that user confusion over the size
> of the standard is really a function of presentation. That is, most of us get
> our information regarding using MPI directly from the standard. For me, this
> is the _only_ standard I've ever actually read! Perhaps I am missing out on
> thousands of C and Fortran capabilities, but sometimes ignorance is bliss.
> That speaks highly to the MPI specification presentation; however it need not
> be the case. An easy solution to the "too many routines" complaint is a
> tutorial/book/chapter on the basics, with pointers to further information. And
> in fact these books exist. That said, I hope that MPI-3 deprecates a
> meaningful volume of functionality.
> 
>> >From the implementer perspective, there appear to be two goals. First is to
>> ease the burden with regard to the amount of functionality that must be
>> supported. (And we users don't want to hear of your whining, esp. from a
>> company the size of Intel :) Second, which overlaps with user concerns, is
>> performance. That is, by defining a small subset of functionality, strong
>> performance (in some sense, e.g. speed or memory requirements) can be
>> realized.
> 
> At the risk of starting too detailed a discussion at this early point (as well
> as exposing my ignorance:), I will throw out a few situations for discussion.
> 
> 1. What  would such a subset would imply with regard to what I view as support
> functionality, such as user-defined datatypes, topologies, etc? Ie could this
> support be easily provided, say by cutting-and-pasting from the full
> implementation you will still provide? (I now see Torsten recommends
> excluding datatypes, but what of other stuff?)
> 2. Even  more broadly (and perhaps very ignorantly), can I simply link in both
> libraries, like -lmpi_subset -lmpi, getting the good stuff from the former and
> the excluded functionality from the latter? In addition to the application
> developers use of MPI, all large application programs I&#185;ve dealt with make
> some use of externally produced libraries (a &#179;very good thing&#178; imo), which
> probably exceed the functionality in a &#179;subset&#178; implementation.
> 3. I  (basically) understand the adverse performance effects of allowing
> promiscuous  receives (MPI_ANY_SOURCE). However, this is a powerful capability
> for many  codes, and used only in moderation, eg for setting up communication
> requirements (such as communication partners in unstructured, semi-structured,
> and dynamic mesh computations). In this case the sender knows its partner, but
> the receiver does not. A reduction(sum) is used to let each process know the
> number of communication partners from which it will receive data, the process
> posts that many promiscuous receives, which when satisfied lets it from then
> on specify the sender. So would it be possible to include this capability in a
> separate function, say the blocking send/recv, but not allow it in the
> non-blocking version?
> 4. Collectives: I can't name a code I've ever  worked with that doesn't
> require MPI_Allreduce (though I wouldn&#185;t be surprised  to hear of many), and
> this in a broad set of science areas. MPI_Bcast is also  often used (but quite
> often only in the setup phase). I see MPI_Reduce used  most often to collect
> timing information, so MPI_Allreduce would probably be  fine as well.
> MPI_Gather is often quite useful, as is MPI_Scatter, but again  often in
> setup. (Though often &#179;setup&#178; occurs once per time step.) Non-constant  size
> versions are often used. And others can also no doubt offer strong  opinions
> regarding inclusion of exclusion. But from an implementation  perspective,
> what are the issues? In particular, is the basic infrastructure  for these
> (and other collective operations) the same? A driving premise for  supporting
> collectives is that the sort of performance driven capability under
> discussion is most needed by applications running at very large scale, which
> is where even very good collect implementations run into problems.
> 5. Language bindings and perhaps other things:  With the expectation/hope that
> full implementations continue to be available,  I could use them for code
> development, thus making use of things like type  checking, etc. And does this
> latter use then imply the need for "stubs" for  things like the (vaporous)
> Fortran bindings module, communicators (if only  MPI_COMM_WORLD is supported),
> etc.? And presuming the answer to #2 is &#179;no&#178;,  could/should the full
> implementation &#179;warn&#178; me (preferably at compile time)  when I&#185;m using
> functionality that rules out use of the subset?
> 6. Will  the profile layer still be supported? Generating usage can still be
> quantified  using a full implementation, but performance would not be (at
> least in this  manner), which would rule out an apples-to-apples comparison
> between a full  implementation and the subset version with its advertised
> superior  performance. (Of course an overall runtime could be compared, which
> is the  final word, but a more detailed analysis is often preferred.)
> 7. If  blocking and non-blocking are required of the subset, aren't these
> blocking  semantics?
> 
>     MPI_Send: MPI_Isend ( ..., &req ); MPI_Wait ( ..., &req );
>     -----
>     MPI_Recv: MPI_Irecv ( ..., &req ); MPI_Wait ( &req );
> 
>         - And speaking of this, are there performance issues associated with
> variants of MPI_Wait, eg MPI_Waitany, MPI_Waitsome?
> 
> Finally, I&#185;ll officially register my concern with what I see as an increasing
> complexity in this effort, esp wrt &#179;multiple subsets&#178;. I don&#185;t intend this
> comment to suppress ideas, but to help keep the beating the drum for
> simplicity, which I see as a key goal of this effort.
> 
> If you read this far, thanks! My apologies if some of these issues have been
> previously covered. And if I've simply exposed myself as ignorant, I feel
> confident is stating that I am not alone - these questions will persist from
> others. :)
> 
> Richard


* 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpi3-subsetting/attachments/20080229/2d53bba5/attachment.html>

From rlgraham at [hidden]  Fri Feb 29 09:32:49 2008
From: rlgraham at [hidden] (Richard Graham)
Date: Fri, 29 Feb 2008 10:32:49 -0500
Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09
In-Reply-To: <20080229150245.GK16623@benten.cs.indiana.edu>
Message-ID: <C3ED8FD1.17785%rlgraham@ornl.gov>


Getting rid of the data types is not an option, in my opinion.  I would be
ok if we decided on a subset that includes something that includes basic
data types and some sort of regular patterns based on these - which I
believe represents a very large fraction of the application uses.

I am NOT advocating going away from the general support we have for data
types in MPI, just providing a way for implementers to know that under some
use case scenarios (which I think are by far the common case) simpler and
more efficient data type support can be provided.  This also allows for
implementations, if they choose to take advantage of h/w gather/scatter
capabilities.  At this stage the notion of subsetting is just that - a
notion - and I don't think that as a group we have thought through all the
implications.

Rich

On 2/29/08 10:02 AM, "Torsten Hoefler" <htor_at_[hidden]> wrote:

> Bronis,
> for the record: I do *not* advocate to get rid of datatypes! I think
> datatypes are a great thing for some parallel applications and they
> certainly should be used as a high-level abstraction. I've implemented
> scatter/gather list-based optimizations for modern NICs (IB).
> 
> But on the other hand, there are many codes out there that do just not
> use datatypes. Codes that are only supposed to run in heterogeneous
> environments. Codes that use sockets instead of MPI. If we want to aim
> at this market, we need to simplify here. A simplification could be to
> use MPI_BYTE by default ;) but it would be better to get rid of the code
> and control-path overhead.
> 
> Just to clarify my opinion,
>   Torsten


From alexander.supalov at [hidden]  Fri Feb 29 09:45:47 2008
From: alexander.supalov at [hidden] (Supalov, Alexander)
Date: Fri, 29 Feb 2008 15:45:47 -0000
Subject: [Mpi3-subsetting] Some "stupid user" questions, comments.
In-Reply-To: <C3ED8CCC.1777A%rlgraham@ornl.gov>
Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A2011974B8@swsmsx413.ger.corp.intel.com>


Thanks. You are right - if there's more than one route between two
processes, there's a matching issue, too. As for my special
implementor's point of view, I was kidding.

________________________________

From: mpi3-subsetting-bounces_at_[hidden]
[mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of
Richard Graham
Sent: Friday, February 29, 2008 4:20 PM
To: mpi3-subsetting_at_[hidden]
Subject: Re: [Mpi3-subsetting] Some "stupid user" questions, comments.

On 2/29/08 9:26 AM, "Supalov, Alexander" <alexander.supalov_at_[hidden]>
wrote:

        Dear RIchard,
        
        Thanks. The more complicated the standard gets, the happier are
the implementors. However, now we try to think like MPI users for a
change, so, thanks for providing a reality check.
        
	>> Quite to the contrary.  The simpler the standard is the
easier to support - complexity is not a good thing at all.
	>> This is my view as an implementer.  Complexity is often
introduced when trying to get good performance out of
	>> a spec that supports a wide variety of options.
        
        Now, to one of your questions. An MPI_ANY_SOURCE MPI_Recv in
multifabric environment means that a receive has to be posted somehow to
more than one fabric in the MPI device layer. Once one of them gets the
message, the posted receives should be cancelled on other fabrics. Now,
what if they've already matched and started to receive something? What
if they cannot cancel a posted receive? And so on. There are 3 to 5 ways
to deal with this situation, with and without actually posting a
receive, but none of them is good enough if you ask me. That's why there
are 3 to 5 of them, actually. And all of them complicate the progress
engine - the heart of an MPI implementation - at exactly the spot where
one wants things simple and fast.
        
	>> The any_source and multiple fabrics are two distinct issues.
Even if you do not support any_source and have
	>> multiple fabrics, you have the issue that to support mpi
ordering semantics, matching needs to be done
	>> in the context of all the nics - unless you decide to have
only one nic do the matching, including any on-host
	>> traffic.  What any_source forces is matching on the receive
side - unless one wants to set up a very complex
	>> and inefficient way to make sure that only one receive is
matched for each wild card receive.
        
        Rich
        
        This means that most of the time we fight these repercussions
and curse the MPI_ANY_SOURCE. Or, looping back to the beginning of this
message, we actually never stop blessing MPI_ANY_SOURCE. Fighting this
kind of trouble is what we are paid for. ;)
        
        Best regards.
        
        Alexander
        
        
________________________________

        From: mpi3-subsetting-bounces_at_[hidden]
[mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of
Richard Barrett
        Sent: Friday, February 29, 2008 2:50 PM
        To: mpi3-subsetting_at_[hidden]
        Subject: [Mpi3-subsetting] Some "stupid user" questions,
comments.
        
        Hi folks,
        
        I'm still sorting things out in my mind, so perhaps this note is
just me talking to myself. But should you feel so compelled to sort
through it, I would appreciate any feedback you might offer; and it will
make me a more informed participant. 
        
        I see two main perspectives: the user and the implementer. I
come from the user side, so I feel comfortable in positing that user
confusion over the size of the standard is really a function of
presentation. That is, most of us get our information regarding using
MPI directly from the standard. For me, this is the _only_ standard I've
ever actually read! Perhaps I am missing out on thousands of C and
Fortran capabilities, but sometimes ignorance is bliss. That speaks
highly to the MPI specification presentation; however it need not be the
case. An easy solution to the "too many routines" complaint is a
tutorial/book/chapter on the basics, with pointers to further
information. And in fact these books exist. That said, I hope that MPI-3
deprecates a meaningful volume of functionality.
        
	>From the implementer perspective, there appear to be two goals.
First is to ease the burden with regard to the amount of functionality
that must be supported. (And we users don't want to hear of your
whining, esp. from a company the size of Intel :) Second, which overlaps
with user concerns, is performance. That is, by defining a small subset
of functionality, strong performance (in some sense, e.g. speed or
memory requirements) can be realized.
        
        At the risk of starting too detailed a discussion at this early
point (as well as exposing my ignorance:), I will throw out a few
situations for discussion.
        
        
        1.	What  would such a subset would imply with regard to
what I view as support  functionality, such as user-defined datatypes,
topologies, etc? Ie could this  support be easily provided, say by
cutting-and-pasting from the full  implementation you will still
provide? (I now see Torsten recommends  excluding datatypes, but what of
other stuff?) 
        2.	Even  more broadly (and perhaps very ignorantly), can I
simply link in both  libraries, like -lmpi_subset -lmpi, getting the
good stuff from the former and  the excluded functionality from the
latter? In addition to the application  developers use of MPI, all large
application programs I've dealt with make  some use of externally
produced libraries (a "very good thing" imo), which  probably exceed the
functionality in a "subset" implementation.   
        3.	I  (basically) understand the adverse performance
effects of allowing promiscuous  receives (MPI_ANY_SOURCE). However,
this is a powerful capability for many  codes, and used only in
moderation, eg for setting up communication  requirements (such as
communication partners in unstructured, semi-structured,  and dynamic
mesh computations). In this case the sender knows its partner, but  the
receiver does not. A reduction(sum) is used to let each process know the
number of communication partners from which it will receive data, the
process  posts that many promiscuous receives, which when satisfied lets
it from then  on specify the sender. So would it be possible to include
this capability in a  separate function, say the blocking send/recv, but
not allow it in the  non-blocking version?   
        4.	Collectives: I can't name a code I've ever  worked with
that doesn't require MPI_Allreduce (though I wouldn't be surprised  to
hear of many), and this in a broad set of science areas. MPI_Bcast is
also  often used (but quite often only in the setup phase). I see
MPI_Reduce used  most often to collect timing information, so
MPI_Allreduce would probably be  fine as well. MPI_Gather is often quite
useful, as is MPI_Scatter, but again  often in setup. (Though often
"setup" occurs once per time step.) Non-constant  size versions are
often used. And others can also no doubt offer strong  opinions
regarding inclusion of exclusion. But from an implementation
perspective, what are the issues? In particular, is the basic
infrastructure  for these (and other collective operations) the same? A
driving premise for  supporting collectives is that the sort of
performance driven capability under  discussion is most needed by
applications running at very large scale, which  is where even very good
collect implementations run into problems.    
        5.	Language bindings and perhaps other things:  With the
expectation/hope that full implementations continue to be available,  I
could use them for code development, thus making use of things like type
checking, etc. And does this latter use then imply the need for "stubs"
for  things like the (vaporous) Fortran bindings module, communicators
(if only  MPI_COMM_WORLD is supported), etc.? And presuming the answer
to #2 is "no",  could/should the full implementation "warn" me
(preferably at compile time)  when I'm using functionality that rules
out use of the subset?   
        6.	Will  the profile layer still be supported? Generating
usage can still be quantified  using a full implementation, but
performance would not be (at least in this  manner), which would rule
out an apples-to-apples comparison between a full  implementation and
the subset version with its advertised superior  performance. (Of course
an overall runtime could be compared, which is the  final word, but a
more detailed analysis is often preferred.)   
        7.	If  blocking and non-blocking are required of the
subset, aren't these blocking  semantics?
                

            MPI_Send: MPI_Isend ( ..., &req ); MPI_Wait ( ..., &req );
            -----
            MPI_Recv: MPI_Irecv ( ..., &req ); MPI_Wait ( &req );
        
                - And speaking of this, are there performance issues
associated with variants of MPI_Wait, eg MPI_Waitany, MPI_Waitsome? 
        
        Finally, I'll officially register my concern with what I see as
an increasing complexity in this effort, esp wrt "multiple subsets". I
don't intend this comment to suppress ideas, but to help keep the
beating the drum for simplicity, which I see as a key goal of this
effort. 
        
        If you read this far, thanks! My apologies if some of these
issues have been previously covered. And if I've simply exposed myself
as ignorant, I feel confident is stating that I am not alone - these
questions will persist from others. :)
        
        Richard
        

---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


* 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpi3-subsetting/attachments/20080229/1ce6a56f/attachment.html>

From rbarrett at [hidden]  Fri Feb 29 10:01:34 2008
From: rbarrett at [hidden] (Richard Barrett)
Date: Fri, 29 Feb 2008 11:01:34 -0500
Subject: [Mpi3-subsetting] MPI_ANY_SOURCE
In-Reply-To: <mailman.2828.1204298409.10290.mpi3-subsetting@lists.mpi-forum.org>
Message-ID: <C3ED968E.7D50%rbarrett@ornl.gov>


>> Now, to one of your questions. An MPI_ANY_SOURCE

Although I appreciate the discussion, my intent (uh-oh!) in bring this up to
let you know I "accept" the problem, yet ask for the capability anyway, but
in a manner that keeps it from presenting problems everywhere. Or maybe I'm
under-estimating what I was once told: the use of MPI_ANY_SOURCE anywhere
means it is a problem everywhere, ie in _every_ function involved in
transmitting data?

If that is the case, but I still _really_ wanted to use -lmpi_subset, I
could do this: suppose a pe knows it will receive data from m pes. It could
post numpe non-blocking receives, complete m, discover who they're from,
then cancel the rest. Now I'm thinking I've created a bigger problem: when
running acros numpes=100k cores, but m is say 10. True?

Barring some sort of workaround, excluding codes that "need" MPI_ANY_SOURCE
seems to meaningfully reduce the number of codes that could use
-lmpi_subset.

> 3. I (basically) understand the adverse performance effects of allowing
promiscuous receives (MPI_ANY_SOURCE). However, this is a powerful capability
for many codes, and used only in moderation, eg for setting up communication
requirements (such as communication partners in unstructured, semi-structured,
and dynamic mesh computations). In this case the sender knows its partner, but
the receiver does not. A reduction(sum) is used to let each process know the
number of communication partners from which it will receive data, the process
posts that many promiscuous receives, which when satisfied lets it from then on
specify the sender. So would it be possible to include this capability in a
separate function, say the blocking send/recv, but not allow it in the
non-blocking version?

Richard

-- 
  Richard Barrett
  Future Technologies Group, Computer Science and Mathematics Division, and
  Scientific Computing Group, National Center for Computational Science
  Oak Ridge National Laboratory
  http://ft.ornl.gov/~rbarrett


From alexander.supalov at [hidden]  Fri Feb 29 10:30:25 2008
From: alexander.supalov at [hidden] (Supalov, Alexander)
Date: Fri, 29 Feb 2008 16:30:25 -0000
Subject: [Mpi3-subsetting] MPI_ANY_SOURCE
In-Reply-To: <C3ED968E.7D50%rbarrett@ornl.gov>
Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A201197514@swsmsx413.ger.corp.intel.com>


I see. Sorry for explaining the obvious. I guess the progress engine may
take a hit every time there are either an MPI_ANY_SOURCE Recv or (thanks
to Rich) multiple paths between the processes. Hence, all transfers are
potentially affected.

Cancellation is a sticky matter. Some fabrics won't let you do this, so
a cancel will always misfire.

-----Original Message-----
From: mpi3-subsetting-bounces_at_[hidden]
[mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of
Richard Barrett
Sent: Friday, February 29, 2008 5:02 PM
To: mpi3-subsetting_at_[hidden]
Subject: [Mpi3-subsetting] MPI_ANY_SOURCE

>> Now, to one of your questions. An MPI_ANY_SOURCE

Although I appreciate the discussion, my intent (uh-oh!) in bring this
up to
let you know I "accept" the problem, yet ask for the capability anyway,
but
in a manner that keeps it from presenting problems everywhere. Or maybe
I'm
under-estimating what I was once told: the use of MPI_ANY_SOURCE
anywhere
means it is a problem everywhere, ie in _every_ function involved in
transmitting data?

If that is the case, but I still _really_ wanted to use -lmpi_subset, I
could do this: suppose a pe knows it will receive data from m pes. It
could
post numpe non-blocking receives, complete m, discover who they're from,
then cancel the rest. Now I'm thinking I've created a bigger problem:
when
running acros numpes=100k cores, but m is say 10. True?

Barring some sort of workaround, excluding codes that "need"
MPI_ANY_SOURCE
seems to meaningfully reduce the number of codes that could use
-lmpi_subset.

> 3. I (basically) understand the adverse performance effects of
allowing
promiscuous receives (MPI_ANY_SOURCE). However, this is a powerful
capability
for many codes, and used only in moderation, eg for setting up
communication
requirements (such as communication partners in unstructured,
semi-structured,
and dynamic mesh computations). In this case the sender knows its
partner, but
the receiver does not. A reduction(sum) is used to let each process know
the
number of communication partners from which it will receive data, the
process
posts that many promiscuous receives, which when satisfied lets it from
then on
specify the sender. So would it be possible to include this capability
in a
separate function, say the blocking send/recv, but not allow it in the
non-blocking version?

Richard

-- 
  Richard Barrett
  Future Technologies Group, Computer Science and Mathematics Division,
and
  Scientific Computing Group, National Center for Computational Science
  Oak Ridge National Laboratory
  http://ft.ornl.gov/~rbarrett
_______________________________________________
Mpi3-subsetting mailing list
Mpi3-subsetting_at_[hidden]
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


From rlgraham at [hidden]  Fri Feb 29 10:59:27 2008
From: rlgraham at [hidden] (Richard Graham)
Date: Fri, 29 Feb 2008 11:59:27 -0500
Subject: [Mpi3-subsetting] MPI_ANY_SOURCE
In-Reply-To: <5ECAB1304A8B5B4CB3F9D6C01E4E21A201197514@swsmsx413.ger.corp.intel.com>
Message-ID: <C3EDA41F.17795%rlgraham@ornl.gov>


On 2/29/08 11:30 AM, "Supalov, Alexander" <alexander.supalov_at_[hidden]>
wrote:

> I see. Sorry for explaining the obvious. I guess the progress engine may
> take a hit every time there are either an MPI_ANY_SOURCE Recv or (thanks
> to Rich) multiple paths between the processes. Hence, all transfers are
> potentially affected.

With any_source the message can come from anyone, so the cost really depends
on the mpi's queuing strategy, so the actual cost is very implementation
specific.  What ever the cost is, there are more potential sources, so at
100k there are 100k potential sources.  The queuing could always have the
unexpected messages cached in a single queue, but then all matching would be
more expensive, vs. more of a hierarchical queue structure ....  For
expected messages there can also be an increase in matching costs, but again
this is implementation specific.

The other cost is that matching really has to be done at the destination -
just a practical need - try to cancel 100k posted receives, after one match
has been made, and make sure that only one proc has done the match.

> 
> Cancellation is a sticky matter. Some fabrics won't let you do this, so
> a cancel will always misfire.

Is this the case on the receive side ?  The cancellation that Richard is
mentioning is a receive side cancellation.  I don't remember a network with
this limitation, but I could very well be wrong on this one - I suppose it
can also depend on how you do the matching.

Rich

> 
> -----Original Message-----
> From: mpi3-subsetting-bounces_at_[hidden]
> [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of
> Richard Barrett
> Sent: Friday, February 29, 2008 5:02 PM
> To: mpi3-subsetting_at_[hidden]
> Subject: [Mpi3-subsetting] MPI_ANY_SOURCE
> 
> 
> 
>>> Now, to one of your questions. An MPI_ANY_SOURCE
> 
> Although I appreciate the discussion, my intent (uh-oh!) in bring this
> up to
> let you know I "accept" the problem, yet ask for the capability anyway,
> but
> in a manner that keeps it from presenting problems everywhere. Or maybe
> I'm
> under-estimating what I was once told: the use of MPI_ANY_SOURCE
> anywhere
> means it is a problem everywhere, ie in _every_ function involved in
> transmitting data?
> 
> If that is the case, but I still _really_ wanted to use -lmpi_subset, I
> could do this: suppose a pe knows it will receive data from m pes. It
> could
> post numpe non-blocking receives, complete m, discover who they're from,
> then cancel the rest. Now I'm thinking I've created a bigger problem:
> when
> running acros numpes=100k cores, but m is say 10. True?
> 
> Barring some sort of workaround, excluding codes that "need"
> MPI_ANY_SOURCE
> seems to meaningfully reduce the number of codes that could use
> -lmpi_subset.
> 
>> 3. I (basically) understand the adverse performance effects of
> allowing
> promiscuous receives (MPI_ANY_SOURCE). However, this is a powerful
> capability
> for many codes, and used only in moderation, eg for setting up
> communication
> requirements (such as communication partners in unstructured,
> semi-structured,
> and dynamic mesh computations). In this case the sender knows its
> partner, but
> the receiver does not. A reduction(sum) is used to let each process know
> the
> number of communication partners from which it will receive data, the
> process
> posts that many promiscuous receives, which when satisfied lets it from
> then on
> specify the sender. So would it be possible to include this capability
> in a
> separate function, say the blocking send/recv, but not allow it in the
> non-blocking version?
> 
> Richard


From alexander.supalov at [hidden]  Fri Feb 29 11:23:46 2008
From: alexander.supalov at [hidden] (Supalov, Alexander)
Date: Fri, 29 Feb 2008 17:23:46 -0000
Subject: [Mpi3-subsetting] MPI_ANY_SOURCE
In-Reply-To: <C3EDA41F.17795%rlgraham@ornl.gov>
Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A20119756A@swsmsx413.ger.corp.intel.com>


I heard Myricom MX would not allow recv cancellation. This needs to be
checked. 

-----Original Message-----
From: mpi3-subsetting-bounces_at_[hidden]
[mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of
Richard Graham
Sent: Friday, February 29, 2008 5:59 PM
To: mpi3-subsetting_at_[hidden]
Subject: Re: [Mpi3-subsetting] MPI_ANY_SOURCE

On 2/29/08 11:30 AM, "Supalov, Alexander" <alexander.supalov_at_[hidden]>
wrote:

> I see. Sorry for explaining the obvious. I guess the progress engine
may
> take a hit every time there are either an MPI_ANY_SOURCE Recv or
(thanks
> to Rich) multiple paths between the processes. Hence, all transfers
are
> potentially affected.

With any_source the message can come from anyone, so the cost really
depends
on the mpi's queuing strategy, so the actual cost is very implementation
specific.  What ever the cost is, there are more potential sources, so
at
100k there are 100k potential sources.  The queuing could always have
the
unexpected messages cached in a single queue, but then all matching
would be
more expensive, vs. more of a hierarchical queue structure ....  For
expected messages there can also be an increase in matching costs, but
again
this is implementation specific.

The other cost is that matching really has to be done at the destination
-
just a practical need - try to cancel 100k posted receives, after one
match
has been made, and make sure that only one proc has done the match.

> 
> Cancellation is a sticky matter. Some fabrics won't let you do this,
so
> a cancel will always misfire.

Is this the case on the receive side ?  The cancellation that Richard is
mentioning is a receive side cancellation.  I don't remember a network
with
this limitation, but I could very well be wrong on this one - I suppose
it
can also depend on how you do the matching.

Rich

> 
> -----Original Message-----
> From: mpi3-subsetting-bounces_at_[hidden]
> [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of
> Richard Barrett
> Sent: Friday, February 29, 2008 5:02 PM
> To: mpi3-subsetting_at_[hidden]
> Subject: [Mpi3-subsetting] MPI_ANY_SOURCE
> 
> 
> 
>>> Now, to one of your questions. An MPI_ANY_SOURCE
> 
> Although I appreciate the discussion, my intent (uh-oh!) in bring this
> up to
> let you know I "accept" the problem, yet ask for the capability
anyway,
> but
> in a manner that keeps it from presenting problems everywhere. Or
maybe
> I'm
> under-estimating what I was once told: the use of MPI_ANY_SOURCE
> anywhere
> means it is a problem everywhere, ie in _every_ function involved in
> transmitting data?
> 
> If that is the case, but I still _really_ wanted to use -lmpi_subset,
I
> could do this: suppose a pe knows it will receive data from m pes. It
> could
> post numpe non-blocking receives, complete m, discover who they're
from,
> then cancel the rest. Now I'm thinking I've created a bigger problem:
> when
> running acros numpes=100k cores, but m is say 10. True?
> 
> Barring some sort of workaround, excluding codes that "need"
> MPI_ANY_SOURCE
> seems to meaningfully reduce the number of codes that could use
> -lmpi_subset.
> 
>> 3. I (basically) understand the adverse performance effects of
> allowing
> promiscuous receives (MPI_ANY_SOURCE). However, this is a powerful
> capability
> for many codes, and used only in moderation, eg for setting up
> communication
> requirements (such as communication partners in unstructured,
> semi-structured,
> and dynamic mesh computations). In this case the sender knows its
> partner, but
> the receiver does not. A reduction(sum) is used to let each process
know
> the
> number of communication partners from which it will receive data, the
> process
> posts that many promiscuous receives, which when satisfied lets it
from
> then on
> specify the sender. So would it be possible to include this capability
> in a
> separate function, say the blocking send/recv, but not allow it in the
> non-blocking version?
> 
> Richard

_______________________________________________
Mpi3-subsetting mailing list
Mpi3-subsetting_at_[hidden]
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


From jsquyres at [hidden]  Fri Feb 29 14:28:53 2008
From: jsquyres at [hidden] (Jeff Squyres)
Date: Fri, 29 Feb 2008 15:28:53 -0500
Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09
In-Reply-To: <5ECAB1304A8B5B4CB3F9D6C01E4E21A201196FE1@swsmsx413.ger.corp.intel.com>
Message-ID: <89169D89-5686-4227-B983-267060C9C3ED@cisco.com>


On Feb 28, 2008, at 11:29 PM, Supalov, Alexander wrote:

> - Communicator & group management: better memory footprint.

Take this point to an extreme - it may be possible to say "this app  
only uses MPI_COMM_WORLD".  In this case, you can remove the  
communicator field from network packets for a small gain in latency,  
or perhaps reduce it from 4 to 2 bytes or 1 byte (e.g., if an app says  
"I'll only use 4 communicators").

> - Message tagging: better support for stable dataflow exchanges,  
> smaller
> packets.

Two points here:

- allow app to eliminate MPI_ANY_TAG
- just like with communicators, allow the app to say "I'll only use N  
tags", where N can save you space in network packets (e.g., if N==1,  
no need for tag on the wire; if N == 2, then you only need 1 byte for  
the tag, etc.).

> - Non-blocking communication: easier ordering, simplified request
> handling.

If there is no non-blocking communication, enormous chunks of the  
progression engine can be optimized in terms of memory (i.e., remove  
lots of now-unnecessary code) and probably a little speed.

On the teleconf (sorry I missed it), was there discussion of how to  
specify these hints?  Perhaps a new function: MPI_INIT_INFO (pass an  
MPI_Info handle to MPI_INIT)?  Or is it something that needs to be  
specified at compile/link time?


-- 
Jeff Squyres
Cisco Systems


From alexander.supalov at [hidden]  Fri Feb 29 14:54:18 2008
From: alexander.supalov at [hidden] (Supalov, Alexander)
Date: Fri, 29 Feb 2008 20:54:18 -0000
Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09
In-Reply-To: <89169D89-5686-4227-B983-267060C9C3ED@cisco.com>
Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A2011975F4@swsmsx413.ger.corp.intel.com>


Hi,

Thanks. I was thinking privately about MPI_Init_subsets or so that would
use an Info object, too. I bet a comparable idea - or at least desire to
keep the number of subset related calls under strict control - was aired
at the telecon by someone, but we didn't go into much detail then.

One reservation I have about Info objects is that they are so flexible
as to be dangerous. They promoting lots of optional, loosely controlled
features that can effectively blur the interface definition. On the
other hand, I don't see any viable alternative to that, at least if the
number of subsets is going to be substantial and ever growing.

Of course, it's a little too early to fix any implementation details I'm
afraid. Anyway, let's keep this idea in mind while we're settling the
scope.

Best regards.

Alexander

-----Original Message-----
From: mpi3-subsetting-bounces_at_[hidden]
[mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of Jeff
Squyres
Sent: Friday, February 29, 2008 9:29 PM
To: mpi3-subsetting_at_[hidden]
Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff telecon
ww09

On Feb 28, 2008, at 11:29 PM, Supalov, Alexander wrote:

> - Communicator & group management: better memory footprint.

Take this point to an extreme - it may be possible to say "this app  
only uses MPI_COMM_WORLD".  In this case, you can remove the  
communicator field from network packets for a small gain in latency,  
or perhaps reduce it from 4 to 2 bytes or 1 byte (e.g., if an app says  
"I'll only use 4 communicators").

> - Message tagging: better support for stable dataflow exchanges,  
> smaller
> packets.

Two points here:

- allow app to eliminate MPI_ANY_TAG
- just like with communicators, allow the app to say "I'll only use N  
tags", where N can save you space in network packets (e.g., if N==1,  
no need for tag on the wire; if N == 2, then you only need 1 byte for  
the tag, etc.).

> - Non-blocking communication: easier ordering, simplified request
> handling.

If there is no non-blocking communication, enormous chunks of the  
progression engine can be optimized in terms of memory (i.e., remove  
lots of now-unnecessary code) and probably a little speed.

On the teleconf (sorry I missed it), was there discussion of how to  
specify these hints?  Perhaps a new function: MPI_INIT_INFO (pass an  
MPI_Info handle to MPI_INIT)?  Or is it something that needs to be  
specified at compile/link time?


-- 
Jeff Squyres
Cisco Systems
_______________________________________________
Mpi3-subsetting mailing list
Mpi3-subsetting_at_[hidden]
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


From jsquyres at [hidden]  Fri Feb 29 15:28:41 2008
From: jsquyres at [hidden] (Jeff Squyres)
Date: Fri, 29 Feb 2008 16:28:41 -0500
Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09
In-Reply-To: <5ECAB1304A8B5B4CB3F9D6C01E4E21A2011975F4@swsmsx413.ger.corp.intel.com>
Message-ID: <11D1F118-5607-4E42-941F-9BE123C0F9B7@cisco.com>


One other issue is that we'd have to make [at least some of] the  
MPI_Info_* functions be able to be called before MPI_INIT.

On Feb 29, 2008, at 3:54 PM, Supalov, Alexander wrote:

> Hi,
>
> Thanks. I was thinking privately about MPI_Init_subsets or so that  
> would
> use an Info object, too. I bet a comparable idea - or at least  
> desire to
> keep the number of subset related calls under strict control - was  
> aired
> at the telecon by someone, but we didn't go into much detail then.
>
> One reservation I have about Info objects is that they are so flexible
> as to be dangerous. They promoting lots of optional, loosely  
> controlled
> features that can effectively blur the interface definition. On the
> other hand, I don't see any viable alternative to that, at least if  
> the
> number of subsets is going to be substantial and ever growing.
>
> Of course, it's a little too early to fix any implementation details  
> I'm
> afraid. Anyway, let's keep this idea in mind while we're settling the
> scope.
>
> Best regards.
>
> Alexander
>
> -----Original Message-----
> From: mpi3-subsetting-bounces_at_[hidden]
> [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of Jeff
> Squyres
> Sent: Friday, February 29, 2008 9:29 PM
> To: mpi3-subsetting_at_[hidden]
> Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff telecon
> ww09
>
> On Feb 28, 2008, at 11:29 PM, Supalov, Alexander wrote:
>
>> - Communicator & group management: better memory footprint.
>
> Take this point to an extreme - it may be possible to say "this app
> only uses MPI_COMM_WORLD".  In this case, you can remove the
> communicator field from network packets for a small gain in latency,
> or perhaps reduce it from 4 to 2 bytes or 1 byte (e.g., if an app says
> "I'll only use 4 communicators").
>
>> - Message tagging: better support for stable dataflow exchanges,
>> smaller
>> packets.
>
> Two points here:
>
> - allow app to eliminate MPI_ANY_TAG
> - just like with communicators, allow the app to say "I'll only use N
> tags", where N can save you space in network packets (e.g., if N==1,
> no need for tag on the wire; if N == 2, then you only need 1 byte for
> the tag, etc.).
>
>> - Non-blocking communication: easier ordering, simplified request
>> handling.
>
>
> If there is no non-blocking communication, enormous chunks of the
> progression engine can be optimized in terms of memory (i.e., remove
> lots of now-unnecessary code) and probably a little speed.
>
> On the teleconf (sorry I missed it), was there discussion of how to
> specify these hints?  Perhaps a new function: MPI_INIT_INFO (pass an
> MPI_Info handle to MPI_INIT)?  Or is it something that needs to be
> specified at compile/link time?
>
> -- 
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> Mpi3-subsetting mailing list
> Mpi3-subsetting_at_[hidden]
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting
> ---------------------------------------------------------------------
> Intel GmbH
> Dornacher Strasse 1
> 85622 Feldkirchen/Muenchen Germany
> Sitz der Gesellschaft: Feldkirchen bei Muenchen
> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> VAT Registration No.: DE129385895
> Citibank Frankfurt (BLZ 502 109 00) 600119052
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
>
> _______________________________________________
> Mpi3-subsetting mailing list
> Mpi3-subsetting_at_[hidden]
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting


-- 
Jeff Squyres
Cisco Systems