From alexander.supalov at [hidden] Fri Feb 22 01:00:44 2008 From: alexander.supalov at [hidden] (Supalov, Alexander) Date: Fri, 22 Feb 2008 07:00:44 -0000 Subject: [Mpi3-subsetting] (no subject) Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A2011113FA@swsmsx413.ger.corp.intel.com> -- Dr Alexander Supalov Intel GmbH Hermuelheimer Strasse 8a 50321 Bruehl, Germany Phone: +49 2232 209034 Mobile: +49 173 511 8735 Fax: +49 2232 209029 --------------------------------------------------------------------- Intel GmbH Dornacher Strasse 1 85622 Feldkirchen/Muenchen Germany Sitz der Gesellschaft: Feldkirchen bei Muenchen Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer Registergericht: Muenchen HRB 47456 Ust.-IdNr. VAT Registration No.: DE129385895 Citibank Frankfurt (BLZ 502 109 00) 600119052 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. * -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.supalov at [hidden] Fri Feb 22 07:48:35 2008 From: alexander.supalov at [hidden] (Supalov, Alexander) Date: Fri, 22 Feb 2008 13:48:35 -0000 Subject: [Mpi3-subsetting] subsetting kickoff telecon ww09 Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A2011117A2@swsmsx413.ger.corp.intel.com> Hi everybody, I suggest that we should have a 1-hour kickoff telecon to get going on the MPI-3 subsetting. Please reply to me directly (alexander.supalov_at_[hidden]) with an indication of the suitable time out of the list below: February 25, 2007 7:00 am PST/10:00 am EST/16:00 CET yes/no/plan B February 25, 2007 8:00 am PST/11:00 am EST/17:00 CET yes/no/plan B February 25, 2007 9:00 am PST/12:00 pm EST/18:00 CET yes/no/plan B February 27, 2007 9:00 am PST/12:00 pm EST/18:00 CET yes/no/plan B February 28, 2007 9:00 am PST/12:00 pm EST/18:00 CET yes/no/plan B February 29, 2007 7:00 am PST/10:00 am EST/16:00 CET yes/no/plan B I'll take care of the bridge and agenda once the time is settled. Best regards. Alexander -- Dr Alexander Supalov Intel GmbH Hermuelheimer Strasse 8a 50321 Bruehl, Germany Phone: +49 2232 209034 Mobile: +49 173 511 8735 Fax: +49 2232 209029 --------------------------------------------------------------------- Intel GmbH Dornacher Strasse 1 85622 Feldkirchen/Muenchen Germany Sitz der Gesellschaft: Feldkirchen bei Muenchen Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer Registergericht: Muenchen HRB 47456 Ust.-IdNr. VAT Registration No.: DE129385895 Citibank Frankfurt (BLZ 502 109 00) 600119052 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. * -------------- next part -------------- An HTML attachment was scrubbed... URL: From leonidm at [hidden] Fri Feb 22 20:22:18 2008 From: leonidm at [hidden] (Leonid Meyerguz) Date: Fri, 22 Feb 2008 18:22:18 -0800 Subject: [Mpi3-subsetting] subsetting kickoff telecon ww09 In-Reply-To: <5ECAB1304A8B5B4CB3F9D6C01E4E21A2011117A2@swsmsx413.ger.corp.intel.com> Message-ID: <43AEA9A9F7768B42A89F554F0EBF7ED8273B6DA0E8@NA-EXMSG-C102.redmond.corp.microsoft.com> Hi Alexander, I vote Feb 27th 9:00 AM PST as my first choice, and Feb 28th 9:00 AM PST as plan B. Regards, Leonid. From: mpi3-subsetting-bounces_at_[hidden] [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of Supalov, Alexander Sent: Friday, February 22, 2008 5:49 AM To: mpi3-subsetting_at_[hidden] Subject: [Mpi3-subsetting] subsetting kickoff telecon ww09 Hi everybody, I suggest that we should have a 1-hour kickoff telecon to get going on the MPI-3 subsetting. Please reply to me directly (alexander.supalov_at_[hidden]) with an indication of the suitable time out of the list below: February 25, 2007 7:00 am PST/10:00 am EST/16:00 CET yes/no/plan B February 25, 2007 8:00 am PST/11:00 am EST/17:00 CET yes/no/plan B February 25, 2007 9:00 am PST/12:00 pm EST/18:00 CET yes/no/plan B February 27, 2007 9:00 am PST/12:00 pm EST/18:00 CET yes/no/plan B February 28, 2007 9:00 am PST/12:00 pm EST/18:00 CET yes/no/plan B February 29, 2007 7:00 am PST/10:00 am EST/16:00 CET yes/no/plan B I'll take care of the bridge and agenda once the time is settled. Best regards. Alexander -- Dr Alexander Supalov Intel GmbH Hermuelheimer Strasse 8a 50321 Bruehl, Germany Phone: +49 2232 209034 Mobile: +49 173 511 8735 Fax: +49 2232 209029 --------------------------------------------------------------------- Intel GmbH Dornacher Strasse 1 85622 Feldkirchen/Muenchen Germany Sitz der Gesellschaft: Feldkirchen bei Muenchen Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer Registergericht: Muenchen HRB 47456 Ust.-IdNr. VAT Registration No.: DE129385895 Citibank Frankfurt (BLZ 502 109 00) 600119052 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. * -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.supalov at [hidden] Sat Feb 23 09:27:07 2008 From: alexander.supalov at [hidden] (Supalov, Alexander) Date: Sat, 23 Feb 2008 15:27:07 -0000 Subject: [Mpi3-subsetting] subsetting kickoff telecon ww09 In-Reply-To: <5ECAB1304A8B5B4CB3F9D6C01E4E21A2011117A2@swsmsx413.ger.corp.intel.com> Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A20113C46D@swsmsx413.ger.corp.intel.com> Hi everybody, Thanks a lot for your replies. I've got 6 so far. Wednesday, February 27, PST 9:00 am PST/12:00 pm EST/18:00 CET emerges as the time when all but one of us can meet. Thursday, February 28, same time is a firm plan B for all but 2 of us. Monday and Friday look increasingly weak. Due to this, I'll wait for more replies till Tuesday morning CET and then announce the final time, connection details, and agenda of the kickoff telecon. Best regards. Alexander ________________________________ From: mpi3-subsetting-bounces_at_[hidden] [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of Supalov, Alexander Sent: Friday, February 22, 2008 2:49 PM To: mpi3-subsetting_at_[hidden] Subject: [Mpi3-subsetting] subsetting kickoff telecon ww09 Hi everybody, I suggest that we should have a 1-hour kickoff telecon to get going on the MPI-3 subsetting. Please reply to me directly (alexander.supalov_at_[hidden]) with an indication of the suitable time out of the list below: February 25, 2007 7:00 am PST/10:00 am EST/16:00 CET yes/no/plan B February 25, 2007 8:00 am PST/11:00 am EST/17:00 CET yes/no/plan B February 25, 2007 9:00 am PST/12:00 pm EST/18:00 CET yes/no/plan B February 27, 2007 9:00 am PST/12:00 pm EST/18:00 CET yes/no/plan B February 28, 2007 9:00 am PST/12:00 pm EST/18:00 CET yes/no/plan B February 29, 2007 7:00 am PST/10:00 am EST/16:00 CET yes/no/plan B I'll take care of the bridge and agenda once the time is settled. Best regards. Alexander -- Dr Alexander Supalov Intel GmbH Hermuelheimer Strasse 8a 50321 Bruehl, Germany Phone: +49 2232 209034 Mobile: +49 173 511 8735 Fax: +49 2232 209029 --------------------------------------------------------------------- Intel GmbH Dornacher Strasse 1 85622 Feldkirchen/Muenchen Germany Sitz der Gesellschaft: Feldkirchen bei Muenchen Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer Registergericht: Muenchen HRB 47456 Ust.-IdNr. VAT Registration No.: DE129385895 Citibank Frankfurt (BLZ 502 109 00) 600119052 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. --------------------------------------------------------------------- Intel GmbH Dornacher Strasse 1 85622 Feldkirchen/Muenchen Germany Sitz der Gesellschaft: Feldkirchen bei Muenchen Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer Registergericht: Muenchen HRB 47456 Ust.-IdNr. VAT Registration No.: DE129385895 Citibank Frankfurt (BLZ 502 109 00) 600119052 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. * -------------- next part -------------- An HTML attachment was scrubbed... URL: From spoole at [hidden] Sat Feb 23 09:40:18 2008 From: spoole at [hidden] (Stephen Poole) Date: Sat, 23 Feb 2008 10:40:18 -0500 Subject: [Mpi3-subsetting] subsetting kickoff telecon ww09 In-Reply-To: <5ECAB1304A8B5B4CB3F9D6C01E4E21A20113C46D@swsmsx413.ger.corp.intel.com> Message-ID: <0227DF42-2AEA-46CD-B82E-AD3087AD2C06@ornl.gov> I can meet either Th or Fr. Wednesday would be OK, but not at that time. I will be in the air. Steve... On Feb 23, 2008, at 10:27 AM, Supalov, Alexander wrote: > Hi everybody, > > Thanks a lot for your replies. I've got 6 so far. > > Wednesday, February 27, PST 9:00 am PST/12:00 pm EST/18:00 CET > emerges as the time when all but one of us can meet. Thursday, > February 28, same time is a firm plan B for all but 2 of us. Monday > and Friday look increasingly weak. > > Due to this, I'll wait for more replies till Tuesday morning CET > and then announce the final time, connection details, and agenda of > the kickoff telecon. > > Best regards. > > Alexander > > From: mpi3-subsetting-bounces_at_[hidden] [mailto:mpi3- > subsetting-bounces_at_[hidden]] On Behalf Of Supalov, > Alexander > Sent: Friday, February 22, 2008 2:49 PM > To: mpi3-subsetting_at_[hidden] > Subject: [Mpi3-subsetting] subsetting kickoff telecon ww09 > > Hi everybody, > > I suggest that we should have a 1-hour kickoff telecon to get going > on the MPI-3 subsetting. Please reply to me directly > (alexander.supalov_at_[hidden]) with an indication of the suitable > time out of the list below: > > February 25, 2007 7:00 am PST/10:00 am EST/16:00 CET yes/no/ > plan B > February 25, 2007 8:00 am PST/11:00 am EST/17:00 CET yes/no/ > plan B > February 25, 2007 9:00 am PST/12:00 pm EST/18:00 CET yes/no/ > plan B > February 27, 2007 9:00 am PST/12:00 pm EST/18:00 CET yes/no/ > plan B > February 28, 2007 9:00 am PST/12:00 pm EST/18:00 CET yes/no/ > plan B > February 29, 2007 7:00 am PST/10:00 am EST/16:00 CET yes/no/ > plan B > > I'll take care of the bridge and agenda once the time is settled. > > Best regards. > > Alexander > > -- > Dr Alexander Supalov > Intel GmbH > Hermuelheimer Strasse 8a > 50321 Bruehl, Germany > Phone: +49 2232 209034 > Mobile: +49 173 511 8735 > Fax: +49 2232 209029 > > --------------------------------------------------------------------- > Intel GmbH > Dornacher Strasse 1 > 85622 Feldkirchen/Muenchen Germany > Sitz der Gesellschaft: Feldkirchen bei Muenchen > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer > Registergericht: Muenchen HRB 47456 Ust.-IdNr. > VAT Registration No.: DE129385895 > Citibank Frankfurt (BLZ 502 109 00) 600119052 > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > --------------------------------------------------------------------- > Intel GmbH > Dornacher Strasse 1 > 85622 Feldkirchen/Muenchen Germany > Sitz der Gesellschaft: Feldkirchen bei Muenchen > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer > Registergericht: Muenchen HRB 47456 Ust.-IdNr. > VAT Registration No.: DE129385895 > Citibank Frankfurt (BLZ 502 109 00) 600119052 > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > _______________________________________________ > Mpi3-subsetting mailing list > Mpi3-subsetting_at_[hidden] > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting ==================================================> Steve Poole Computer Science and Mathematics Division Chief Scientist / Director of Special Programs Computational Sciences and Engineering Division National Center for Computational Sciences Division Oak Ridge National Laboratory 865.574.9008 (0ffice) 865.574.6076 (Fax) "Wisdom is not a product of schooling, but of the lifelong attempt to acquire it" Albert Einstein ====================================================> * -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.supalov at [hidden] Tue Feb 26 02:25:04 2008 From: alexander.supalov at [hidden] (Supalov, Alexander) Date: Tue, 26 Feb 2008 08:25:04 -0000 Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09 In-Reply-To: <5ECAB1304A8B5B4CB3F9D6C01E4E21A2011117A2@swsmsx413.ger.corp.intel.com> Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A20113CCDA@swsmsx413.ger.corp.intel.com> Hi everybody, Let's meet on February 28, 2008, at 9:00 PST/12:00 pm EST/18:00 CET. +1-916-356-2663, Bridge: 4, Passcode: 5661281 - Opens & introductions - Scope of the effort - Next steps If you cannot make the time, please send your notes to this list prior to the meeting. Best regards. Alexander --------------------------------------------------------------------- Intel GmbH Dornacher Strasse 1 85622 Feldkirchen/Muenchen Germany Sitz der Gesellschaft: Feldkirchen bei Muenchen Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer Registergericht: Muenchen HRB 47456 Ust.-IdNr. VAT Registration No.: DE129385895 Citibank Frankfurt (BLZ 502 109 00) 600119052 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. * -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.supalov at [hidden] Tue Feb 26 05:10:15 2008 From: alexander.supalov at [hidden] (Supalov, Alexander) Date: Tue, 26 Feb 2008 11:10:15 -0000 Subject: [Mpi3-subsetting] Subsetting scope: a POV Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A20113CED8@swsmsx413.ger.corp.intel.com> Hi everybody, In the run-up to our kick-off, here's my POV on the subsetting and its possible scope/role in the MPI-3. Your comments and suggestions are most welcome. We start this activity because: 1) Certain industrial customers complain about MPI complexity and inadequacy 2) Complexity is going to grow in MPI-3 3) Growing complexity may have growing performance implications As a result of the above, customers drift away from the MPI to home-grown libraries, usually based on sockets. This effectively eliminates fast networks from their scope, unless they can profit from fast IP emulation layers. Moreover, this customer drift, if continued, may make MPI irrelevant in some HPC areas and lead to creation of alternative interfaces there. The main purpose of the Forum, as well as the subsetting WG, is thus to react to customer demand and make MPI faster and easier to use, especially in those areas that are subjected to the increasing customer drift (think, e.g., massive master-slave computations). Basing on these premises, the subsetting, in my mind, should try to: 1) Make MPI standard modular. This may include: a) Splitting the standard functionality into coherent groups that users will be able to select/deselect at init time b) Making implementation of some modules/functionality optional (think dynamic process support) as they are anyway now c) Addressing not only functional groups but also certain aspects of the standard that may not be needed in certain use cases (think communicator management, message tagging, derived datatypes, MPI_ANY_SOURCE support, non-blocking communication, etc.) 2) As part of the modularization, optionally identify the minimum functional MPI subset. This may be: a) Those 6 calls (Init, Rank, Size, Send, Recv, Finalize), possibly w/o communicator management and derived datatypes. b) A more flexible combination of modules actually needed by the user 3) Connect subsetting with other MPI-3 activities (FT, ABI, collectives, etc.) As a result of modularization, we should strive to achieve 1) Simplification of the standard for the newcomers 2) Performance advantages for reasonable module combinations 3) Influence upon the overall shape of the MPI-3 standard There are certainly quite a few concerns here: 1) We may end up complicating the standard and its implementation even further 2) We may facilitate a split of the standard into several mutually incompatible implementation "families" 3) We may cause some valid MPI-3 applications break if they use optional modules not available in the implementation involved 4) We may get carried away by academic considerations and miss the actual customer demands in the process 5) We may be engaging into a lost battle because the MPI standard is way too rigid by design/purpose to be simplified To guard against all this, we need to work closely with other WGs and the Forum as a whole, define our goals as early as possible, and solicit extensive Forum and customer feedback. >From all this, by the time of the Forum meeting in March, we should have at least a couple of slides reflecting our intentions and plans. Best regards. Alexander -- Dr Alexander Supalov Intel GmbH Hermuelheimer Strasse 8a 50321 Bruehl, Germany Phone: +49 2232 209034 Mobile: +49 173 511 8735 Fax: +49 2232 209029 --------------------------------------------------------------------- Intel GmbH Dornacher Strasse 1 85622 Feldkirchen/Muenchen Germany Sitz der Gesellschaft: Feldkirchen bei Muenchen Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer Registergericht: Muenchen HRB 47456 Ust.-IdNr. VAT Registration No.: DE129385895 Citibank Frankfurt (BLZ 502 109 00) 600119052 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. * -------------- next part -------------- An HTML attachment was scrubbed... URL: From bronis at [hidden] Tue Feb 26 06:54:10 2008 From: bronis at [hidden] (Bronis R. de Supinski) Date: Tue, 26 Feb 2008 04:54:10 -0800 (PST) Subject: [Mpi3-subsetting] Subsetting scope: a POV In-Reply-To: <5ECAB1304A8B5B4CB3F9D6C01E4E21A20113CED8@swsmsx413.ger.corp.intel.com> Message-ID: Alexander: Re: > We start this activity because: > > 1) Certain industrial customers complain about MPI complexity and > inadequacy While this is a legitimate concern, it cannot be allowed to cause the standard to devolve into a fractured morass that removes the portability of MPI programs that has been a key aspect in the success of MPI. > 2) Complexity is going to grow in MPI-3 True. And it is not clear that many of the proposed extensions are required for portable programming. In fact, they could be designed with the idea that they are optional in mind. > 3) Growing complexity may have growing performance implications > > As a result of the above, customers drift away from the MPI to > home-grown libraries, usually based on sockets. This effectively > eliminates fast networks from their scope, unless they can profit from > fast IP emulation layers. Moreover, this customer drift, if continued, > may make MPI irrelevant in some HPC areas and lead to creation of > alternative interfaces there. > > The main purpose of the Forum, as well as the subsetting WG, is thus to > react to customer demand and make MPI faster and easier to use, > especially in those areas that are subjected to the increasing customer > drift (think, e.g., massive master-slave computations). > > Basing on these premises, the subsetting, in my mind, should try to: > > 1) Make MPI standard modular. This may include: > a) Splitting the standard functionality into coherent groups that > users will be able to select/deselect at init time > b) Making implementation of some modules/functionality optional > (think dynamic process support) as they are anyway now The key issue will de defining what is optional. Clearly, dynamic process support is a good candidate since it already effectively is. However, most of the functions from MPI-1 are not (there may be some concepts/features that can be -- perhaps wild cards and topologies) but the main communication functions (particularly the full set of collectives) are not; nor is the profiling interface (in fact one could argue that the profiling interface could subsetted in corrspondence to the user level subsetting; it's not clear anything else makes sense). > c) Addressing not only functional groups but also certain aspects of > the standard that may not be needed in certain use cases (think > communicator management, message tagging, derived datatypes, > MPI_ANY_SOURCE support, non-blocking communication, etc.) > > 2) As part of the modularization, optionally identify the minimum > functional MPI subset. This may be: > a) Those 6 calls (Init, Rank, Size, Send, Recv, Finalize), possibly > w/o communicator management and derived datatypes. This may be a reasonable set. One thing that needs to be stated is that the minimal subset should not be the only non-optional one. If you do that, then portability is lost. > b) A more flexible combination of modules actually needed by the > user > > 3) Connect subsetting with other MPI-3 activities (FT, ABI, collectives, > etc.) > > As a result of modularization, we should strive to achieve > > 1) Simplification of the standard for the newcomers > 2) Performance advantages for reasonable module combinations > 3) Influence upon the overall shape of the MPI-3 standard > > There are certainly quite a few concerns here: > > 1) We may end up complicating the standard and its implementation even > further > 2) We may facilitate a split of the standard into several mutually > incompatible implementation "families" Yes, this probably the most significant concern. You can certainly subset the standard without making it more complicated but the obvious way to do it easily results in this problem. > 3) We may cause some valid MPI-3 applications break if they use optional > modules not available in the implementation involved A publish/subscribe query interface is clearly a minimal part of what needs to be provided. Bronis > 4) We may get carried away by academic considerations and miss the > actual customer demands in the process > 5) We may be engaging into a lost battle because the MPI standard is way > too rigid by design/purpose to be simplified > > To guard against all this, we need to work closely with other WGs and > the Forum as a whole, define our goals as early as possible, and solicit > extensive Forum and customer feedback. > > >From all this, by the time of the Forum meeting in March, we should have > at least a couple of slides reflecting our intentions and plans. > > Best regards. > > Alexander > > -- > Dr Alexander Supalov > Intel GmbH > Hermuelheimer Strasse 8a > 50321 Bruehl, Germany > Phone: +49 2232 209034 > Mobile: +49 173 511 8735 > Fax: +49 2232 209029 > > --------------------------------------------------------------------- > Intel GmbH > Dornacher Strasse 1 > 85622 Feldkirchen/Muenchen Germany > Sitz der Gesellschaft: Feldkirchen bei Muenchen > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer > Registergericht: Muenchen HRB 47456 Ust.-IdNr. > VAT Registration No.: DE129385895 > Citibank Frankfurt (BLZ 502 109 00) 600119052 > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > From alexander.supalov at [hidden] Thu Feb 28 12:04:32 2008 From: alexander.supalov at [hidden] (Supalov, Alexander) Date: Thu, 28 Feb 2008 18:04:32 -0000 Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09 In-Reply-To: <5ECAB1304A8B5B4CB3F9D6C01E4E21A2011117A2@swsmsx413.ger.corp.intel.com> Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A20116EFEB@swsmsx413.ger.corp.intel.com> Hi everybody, Thank you for your time today. It was a very good discussion. Here's what I captured (please add/modify what I may have missed): Present: Leonid Meyerguz (Microsoft), Rich Graham (ORNL), Richard Barrett (ORNL), Torsten Hoefler (ISU), Alexander Supalov (Intel) - Opens & introductions - Scope of the effort - Rich - Minimum subset consistent with the rest of MPI, for performance/memory footprint optimization - Danger of splitting MPI, hence against optional features in the standard - Both blocking & nonblocking belong to the core - Torsten - Some collectives may go into selectable subsets - MPI_ANY_SOURCE considered harmful - Leonid - Flexible support for optional features, means for choosing and advertising level of compliance/set of features - See enclosed email for Alexander's POV - General discussion snapshots - Support of subsets: some or all? If some, possible linkage problems in static apps (or dead calls). If all, where's the gain? - Optional: really optional (may be not present) or selectable (are present but may be unused)? - Performance penalty for unused subsets: implementation matter or standard choice? - Portability may be limited to certain class of applications (think FT, master-slave runs) - All we design needs to be implementable, complexity needs to be controlled - An ability to use certain set of subsets should not preclude pulling in other modules if necessary - Whatever we do, it should not conflict with the ABI efforts - Need to stay nice and be nicer wrt to the libraries (think threading) and keep things simple - The simplification argument, if put first, may not be liked by some - Next steps - Please comment on these minutes, and add/modify what I may have missed - I'll prepare a couple of slides by next week summarizing our discussion so far; again, your feedback will be most welcome - At the meeting, it may be great to meet F2F briefly and discuss any eventual loose ends before the presentation at the Forum; I'll see to this Best regards. Alexander -- Dr Alexander Supalov Intel GmbH Hermuelheimer Strasse 8a 50321 Bruehl, Germany Phone: +49 2232 209034 Mobile: +49 173 511 8735 Fax: +49 2232 209029 --------------------------------------------------------------------- Intel GmbH Dornacher Strasse 1 85622 Feldkirchen/Muenchen Germany Sitz der Gesellschaft: Feldkirchen bei Muenchen Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer Registergericht: Muenchen HRB 47456 Ust.-IdNr. VAT Registration No.: DE129385895 Citibank Frankfurt (BLZ 502 109 00) 600119052 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. attached mail follows: Hi everybody, In the run-up to our kick-off, here's my POV on the subsetting and its possible scope/role in the MPI-3. Your comments and suggestions are most welcome. We start this activity because: 1) Certain industrial customers complain about MPI complexity and inadequacy 2) Complexity is going to grow in MPI-3 3) Growing complexity may have growing performance implications As a result of the above, customers drift away from the MPI to home-grown libraries, usually based on sockets. This effectively eliminates fast networks from their scope, unless they can profit from fast IP emulation layers. Moreover, this customer drift, if continued, may make MPI irrelevant in some HPC areas and lead to creation of alternative interfaces there. The main purpose of the Forum, as well as the subsetting WG, is thus to react to customer demand and make MPI faster and easier to use, especially in those areas that are subjected to the increasing customer drift (think, e.g., massive master-slave computations). Basing on these premises, the subsetting, in my mind, should try to: 1) Make MPI standard modular. This may include: a) Splitting the standard functionality into coherent groups that users will be able to select/deselect at init time b) Making implementation of some modules/functionality optional (think dynamic process support) as they are anyway now c) Addressing not only functional groups but also certain aspects of the standard that may not be needed in certain use cases (think communicator management, message tagging, derived datatypes, MPI_ANY_SOURCE support, non-blocking communication, etc.) 2) As part of the modularization, optionally identify the minimum functional MPI subset. This may be: a) Those 6 calls (Init, Rank, Size, Send, Recv, Finalize), possibly w/o communicator management and derived datatypes. b) A more flexible combination of modules actually needed by the user 3) Connect subsetting with other MPI-3 activities (FT, ABI, collectives, etc.) As a result of modularization, we should strive to achieve 1) Simplification of the standard for the newcomers 2) Performance advantages for reasonable module combinations 3) Influence upon the overall shape of the MPI-3 standard There are certainly quite a few concerns here: 1) We may end up complicating the standard and its implementation even further 2) We may facilitate a split of the standard into several mutually incompatible implementation "families" 3) We may cause some valid MPI-3 applications break if they use optional modules not available in the implementation involved 4) We may get carried away by academic considerations and miss the actual customer demands in the process 5) We may be engaging into a lost battle because the MPI standard is way too rigid by design/purpose to be simplified To guard against all this, we need to work closely with other WGs and the Forum as a whole, define our goals as early as possible, and solicit extensive Forum and customer feedback. >From all this, by the time of the Forum meeting in March, we should have at least a couple of slides reflecting our intentions and plans. Best regards. Alexander -- Dr Alexander Supalov Intel GmbH Hermuelheimer Strasse 8a 50321 Bruehl, Germany Phone: +49 2232 209034 Mobile: +49 173 511 8735 Fax: +49 2232 209029 * -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From htor at [hidden] Thu Feb 28 22:07:51 2008 From: htor at [hidden] (Torsten Hoefler) Date: Thu, 28 Feb 2008 23:07:51 -0500 Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09 In-Reply-To: <5ECAB1304A8B5B4CB3F9D6C01E4E21A20116EFEB@swsmsx413.ger.corp.intel.com> Message-ID: <20080229040751.GB16623@benten.cs.indiana.edu> Hi, > Present: Leonid Meyerguz (Microsoft), Rich Graham (ORNL), Richard Barrett > (ORNL), Torsten Hoefler (ISU), Alexander Supalov (Intel) just for the record, it's "IU" not "ISU" :-) > - Scope of the effort > - Rich > - Minimum subset consistent with the rest of MPI, for > performance/memory footprint optimization > - Danger of splitting MPI, hence against optional features in the > standard I back that (danger of optional features for portability). I'd propose to split the current standard into mostly self-contained subsets that have clearly defined interfaces to the rest of the standard. Note: this only defines logical interfaces, that does *not* define how those things are to be implemented. This makes it easier to understand the standard and have separate (portable) libraries for the subsets, it does not influence optimization possibilities by implementing everything in a monolithic block (i.e., central progress). > - Both blocking & nonblocking belong to the core > - Torsten > - Some collectives may go into selectable subsets I see three subsets: blocking colls, non-blocking colls and topological colls (maybe also blocking / non-blocking). > - MPI_ANY_SOURCE considered harmful I'd like to add datatypes and heterogeneity to this list (with regards to performance). Alexander mentioned the dynamics. I think we should have a lit of items ready that could influence optimization possibilities significanty if they were to be announced by the user before he can use them. That would give another strong argument for the subsetting. Best, Torsten -- bash$ :(){ :|:&};: --------------------- http://www.unixer.de/ ----- Indiana University | http://www.indiana.edu Open Systems Lab | http://osl.iu.edu/ 150 S. Woodlawn Ave. | Bloomington, IN, 474045-7104 | USA Lindley Hall Room 135 | +01 (812) 855-3608 From alexander.supalov at [hidden] Thu Feb 28 22:29:01 2008 From: alexander.supalov at [hidden] (Supalov, Alexander) Date: Fri, 29 Feb 2008 04:29:01 -0000 Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09 In-Reply-To: <20080229040751.GB16623@benten.cs.indiana.edu> Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A201196FE1@swsmsx413.ger.corp.intel.com> Hi, Thanks. What subsets inside the current standard would you propose? What interfaces between them would you envision? Good idea about the optimization opportunities. Here's an initial combined list, with the main benefits as I see them. Please comment/extend. - Dynamic process support: less overhead in the progress engine, easier global rank handling. - Heterogeneity: better memory footprint, easier data handling. - Derived datatypes (especially those with holes): better memory footprint. - MPI_ANY_SOURCE: faster, more simple multifabric progress. - File I/O: smaller requests, easier wait/test functions. - One-sided ops: no passive target w/o MPI calls - no extra progress thread. - Communicator & group management: better memory footprint. - Message tagging: better support for stable dataflow exchanges, smaller packets. - Non-blocking communication: easier ordering, simplified request handling. Best regards. Alexander -----Original Message----- From: mpi3-subsetting-bounces_at_[hidden] [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of Torsten Hoefler Sent: Friday, February 29, 2008 5:08 AM To: mpi3-subsetting_at_[hidden] Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09 Hi, > Present: Leonid Meyerguz (Microsoft), Rich Graham (ORNL), Richard Barrett > (ORNL), Torsten Hoefler (ISU), Alexander Supalov (Intel) just for the record, it's "IU" not "ISU" :-) > - Scope of the effort > - Rich > - Minimum subset consistent with the rest of MPI, for > performance/memory footprint optimization > - Danger of splitting MPI, hence against optional features in the > standard I back that (danger of optional features for portability). I'd propose to split the current standard into mostly self-contained subsets that have clearly defined interfaces to the rest of the standard. Note: this only defines logical interfaces, that does *not* define how those things are to be implemented. This makes it easier to understand the standard and have separate (portable) libraries for the subsets, it does not influence optimization possibilities by implementing everything in a monolithic block (i.e., central progress). > - Both blocking & nonblocking belong to the core > - Torsten > - Some collectives may go into selectable subsets I see three subsets: blocking colls, non-blocking colls and topological colls (maybe also blocking / non-blocking). > - MPI_ANY_SOURCE considered harmful I'd like to add datatypes and heterogeneity to this list (with regards to performance). Alexander mentioned the dynamics. I think we should have a lit of items ready that could influence optimization possibilities significanty if they were to be announced by the user before he can use them. That would give another strong argument for the subsetting. Best, Torsten -- bash$ :(){ :|:&};: --------------------- http://www.unixer.de/ ----- Indiana University | http://www.indiana.edu Open Systems Lab | http://osl.iu.edu/ 150 S. Woodlawn Ave. | Bloomington, IN, 474045-7104 | USA Lindley Hall Room 135 | +01 (812) 855-3608 _______________________________________________ Mpi3-subsetting mailing list Mpi3-subsetting_at_[hidden] http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting --------------------------------------------------------------------- Intel GmbH Dornacher Strasse 1 85622 Feldkirchen/Muenchen Germany Sitz der Gesellschaft: Feldkirchen bei Muenchen Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer Registergericht: Muenchen HRB 47456 Ust.-IdNr. VAT Registration No.: DE129385895 Citibank Frankfurt (BLZ 502 109 00) 600119052 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. From htor at [hidden] Thu Feb 28 22:44:17 2008 From: htor at [hidden] (Torsten Hoefler) Date: Thu, 28 Feb 2008 23:44:17 -0500 Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09 In-Reply-To: <5ECAB1304A8B5B4CB3F9D6C01E4E21A201196FE1@swsmsx413.ger.corp.intel.com> Message-ID: <20080229044417.GI16623@benten.cs.indiana.edu> Hi Alexander, > Thanks. What subsets inside the current standard would you propose? > What interfaces between them would you envision? that is a long discussion, I guess. So just to put something up for discussion: One subset could be collective communication and it would use Send/Recv from the MPI-core interface. Same for non-blockong colls (using nonblocking send/recv). Again, this is a logical design, it enables us to easily implement a portable library that only uses this interface and offers the standardized features. This library can be imported by vendors who do not want to optimize the substet that is supported by the lib. However, the MPI implementor is free to ignore the interface and do the collectives inside the library in a monolithic way (for performance). Other subsets could be: - topology functions - language bindings (certainly needs discussion) - data-type handling - groups/communicator handling (interface definition would be complex) - profiling interface (similar to language bindings) - parallel I/O - process management - one-sided (if this is not in core) - grequests > Good idea about the optimization opportunities. Here's an initial > combined list, with the main benefits as I see them. Please > comment/extend. > > - Dynamic process support: less overhead in the progress engine, easier > global rank handling. ack > - Heterogeneity: better memory footprint, easier data handling. easier equals faster in this case > - Derived datatypes (especially those with holes): better memory > footprint. hmm, I don't get the memory footprint argument? But I'd say that it simplifies the critical path (one if less) and many applications just don't need datatypes. This is necessary if we want to broaden our scope (cf. the sockets interface has no datatypes and works well) > - MPI_ANY_SOURCE: faster, more simple multifabric progress. ack + receiver-based protocols (I wrote about this in "Optimizing non-blocking Collective Operations for InfiniBand" will be presented at the CAC workshop at IPDPS'07. > - File I/O: smaller requests, easier wait/test functions. yes > - One-sided ops: no passive target w/o MPI calls - no extra progress > thread. > - Communicator & group management: better memory footprint. > - Message tagging: better support for stable dataflow exchanges, smaller > packets. ack > - Non-blocking communication: easier ordering, simplified request > handling. I am not sure about this since only the local matching differs (slightly) here, i.e., packets match a waiting recv (potentially dozens of them in different threads) vs. packets match a non-blocking request. This is pretty much the same overhead. How does that influence MPIs ordering constraints? Best, Torsten -- bash$ :(){ :|:&};: --------------------- http://www.unixer.de/ ----- Indiana University | http://www.indiana.edu Open Systems Lab | http://osl.iu.edu/ 150 S. Woodlawn Ave. | Bloomington, IN, 474045-7104 | USA Lindley Hall Room 135 | +01 (812) 855-3608 From bronis at [hidden] Thu Feb 28 22:53:24 2008 From: bronis at [hidden] (Bronis R. de Supinski) Date: Thu, 28 Feb 2008 20:53:24 -0800 (PST) Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09 In-Reply-To: <5ECAB1304A8B5B4CB3F9D6C01E4E21A201196FE1@swsmsx413.ger.corp.intel.com> Message-ID: All: OK, I have to respond to the notion that derived datatypes limit performance. It is just not a reasonable position. Sure, if you can send contiguous locations, you will get higher performance. The problem is that codes do not only need to send contiguous data so that is not an adequate reason to say derived datatypes limit performance. So, what is left? That there is some more efficient way to send non-contiguous data? How? As multiple messages, each of which send contiguous data? If so, then the implementation could do this under the covers and the datatypes are just a convenience for the user not to have to specify the individual sends. OK, suppose that's not the reason. Perhaps the user can do the copying into a contiguous buffer and get better performance? While I have seen this hold with some implementations, it is absurd. There is no reason that I can discern as to why the user should be able to deduce a better copying mechanism than the MPI implementer. So, again, at worst, the datatypes should be a convenience. Do you have an alternative reason or a refutation of these opinions? What is more important, it is certainly possible to build scatter/gather support into a NIC and achieve better performance with datatypes than without. While there are issues to be resolved for that (primarily the issue of pinning memory), they are solvable with the right hardware mechanism. Just because it does not yet exist is not an adequate reason to say "Get rid of datatypes". OK, you are not saying that but you are saying to deprecate them in a sense. And saying you could send contiguous sends more efficiently is a bad argument here. How do datatypes cause inefficiency for that? How much is that cost really? At what point do you hit where the answer is "It would be faster not to compute anything"? Bronis On Fri, 29 Feb 2008, Supalov, Alexander wrote: > Hi, > > Thanks. What subsets inside the current standard would you propose? What > interfaces between them would you envision? > > Good idea about the optimization opportunities. Here's an initial > combined list, with the main benefits as I see them. Please > comment/extend. > > - Dynamic process support: less overhead in the progress engine, easier > global rank handling. > - Heterogeneity: better memory footprint, easier data handling. > - Derived datatypes (especially those with holes): better memory > footprint. > - MPI_ANY_SOURCE: faster, more simple multifabric progress. > - File I/O: smaller requests, easier wait/test functions. > - One-sided ops: no passive target w/o MPI calls - no extra progress > thread. > - Communicator & group management: better memory footprint. > - Message tagging: better support for stable dataflow exchanges, smaller > packets. > - Non-blocking communication: easier ordering, simplified request > handling. > > Best regards. > > Alexander > > -----Original Message----- > From: mpi3-subsetting-bounces_at_[hidden] > [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of > Torsten Hoefler > Sent: Friday, February 29, 2008 5:08 AM > To: mpi3-subsetting_at_[hidden] > Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff telecon > ww09 > > Hi, > > Present: Leonid Meyerguz (Microsoft), Rich Graham (ORNL), Richard > Barrett > > (ORNL), Torsten Hoefler (ISU), Alexander Supalov (Intel) > just for the record, it's "IU" not "ISU" :-) > > > - Scope of the effort > > - Rich > > - Minimum subset consistent with the rest of MPI, for > > performance/memory footprint optimization > > - Danger of splitting MPI, hence against optional features in > the > > standard > I back that (danger of optional features for portability). I'd propose > to split the current standard into mostly self-contained subsets that > have clearly defined interfaces to the rest of the standard. Note: this > only defines logical interfaces, that does *not* define how those things > are to be implemented. This makes it easier to understand the standard > and have separate (portable) libraries for the subsets, it does not > influence optimization possibilities by implementing everything in a > monolithic block (i.e., central progress). > > > - Both blocking & nonblocking belong to the core > > - Torsten > > - Some collectives may go into selectable subsets > I see three subsets: blocking colls, non-blocking colls and topological > colls (maybe also blocking / non-blocking). > > > - MPI_ANY_SOURCE considered harmful > I'd like to add datatypes and heterogeneity to this list (with regards > to performance). Alexander mentioned the dynamics. I think we should > have a lit of items ready that could influence optimization > possibilities significanty if they were to be announced by the user > before he can use them. That would give another strong argument for the > subsetting. > > Best, > Torsten > > -- > bash$ :(){ :|:&};: --------------------- http://www.unixer.de/ ----- > Indiana University | http://www.indiana.edu > Open Systems Lab | http://osl.iu.edu/ > 150 S. Woodlawn Ave. | Bloomington, IN, 474045-7104 | USA > Lindley Hall Room 135 | +01 (812) 855-3608 > _______________________________________________ > Mpi3-subsetting mailing list > Mpi3-subsetting_at_[hidden] > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting > --------------------------------------------------------------------- > Intel GmbH > Dornacher Strasse 1 > 85622 Feldkirchen/Muenchen Germany > Sitz der Gesellschaft: Feldkirchen bei Muenchen > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer > Registergericht: Muenchen HRB 47456 Ust.-IdNr. > VAT Registration No.: DE129385895 > Citibank Frankfurt (BLZ 502 109 00) 600119052 > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > > > _______________________________________________ > Mpi3-subsetting mailing list > Mpi3-subsetting_at_[hidden] > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting > From alexander.supalov at [hidden] Thu Feb 28 22:58:13 2008 From: alexander.supalov at [hidden] (Supalov, Alexander) Date: Fri, 29 Feb 2008 04:58:13 -0000 Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09 In-Reply-To: <20080229044417.GI16623@benten.cs.indiana.edu> Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A201196FE2@swsmsx413.ger.corp.intel.com> Hi, Thanks. As soon as there's a couple of non-blocking recvs out there, waiting for them in reverse order requires tracking of the moment when the receives were posted. In some cases this leads to extra fields and data exchanges. The footprint argument generally says that the library will be smaller. This may be a minor matter for general purpose computers, but as soon as you go to Petascale, you need every byte on the compute nodes for user data, especially if dynamic libraries are not supported. As for the collectives, many are implemented using SendRecv, and that blocking call in turn often uses non-blocking communication. Classic Alltoallv algorithm uses nonblocking calls, too. So, I'm not sure that even unoptimized blocking collectives will always use only blocking pt2pt. Best regards. Alexander -----Original Message----- From: mpi3-subsetting-bounces_at_[hidden] [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of Torsten Hoefler Sent: Friday, February 29, 2008 5:44 AM To: mpi3-subsetting_at_[hidden] Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09 Hi Alexander, > Thanks. What subsets inside the current standard would you propose? > What interfaces between them would you envision? that is a long discussion, I guess. So just to put something up for discussion: One subset could be collective communication and it would use Send/Recv from the MPI-core interface. Same for non-blockong colls (using nonblocking send/recv). Again, this is a logical design, it enables us to easily implement a portable library that only uses this interface and offers the standardized features. This library can be imported by vendors who do not want to optimize the substet that is supported by the lib. However, the MPI implementor is free to ignore the interface and do the collectives inside the library in a monolithic way (for performance). Other subsets could be: - topology functions - language bindings (certainly needs discussion) - data-type handling - groups/communicator handling (interface definition would be complex) - profiling interface (similar to language bindings) - parallel I/O - process management - one-sided (if this is not in core) - grequests > Good idea about the optimization opportunities. Here's an initial > combined list, with the main benefits as I see them. Please > comment/extend. > > - Dynamic process support: less overhead in the progress engine, easier > global rank handling. ack > - Heterogeneity: better memory footprint, easier data handling. easier equals faster in this case > - Derived datatypes (especially those with holes): better memory > footprint. hmm, I don't get the memory footprint argument? But I'd say that it simplifies the critical path (one if less) and many applications just don't need datatypes. This is necessary if we want to broaden our scope (cf. the sockets interface has no datatypes and works well) > - MPI_ANY_SOURCE: faster, more simple multifabric progress. ack + receiver-based protocols (I wrote about this in "Optimizing non-blocking Collective Operations for InfiniBand" will be presented at the CAC workshop at IPDPS'07. > - File I/O: smaller requests, easier wait/test functions. yes > - One-sided ops: no passive target w/o MPI calls - no extra progress > thread. > - Communicator & group management: better memory footprint. > - Message tagging: better support for stable dataflow exchanges, smaller > packets. ack > - Non-blocking communication: easier ordering, simplified request > handling. I am not sure about this since only the local matching differs (slightly) here, i.e., packets match a waiting recv (potentially dozens of them in different threads) vs. packets match a non-blocking request. This is pretty much the same overhead. How does that influence MPIs ordering constraints? Best, Torsten -- bash$ :(){ :|:&};: --------------------- http://www.unixer.de/ ----- Indiana University | http://www.indiana.edu Open Systems Lab | http://osl.iu.edu/ 150 S. Woodlawn Ave. | Bloomington, IN, 474045-7104 | USA Lindley Hall Room 135 | +01 (812) 855-3608 _______________________________________________ Mpi3-subsetting mailing list Mpi3-subsetting_at_[hidden] http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting --------------------------------------------------------------------- Intel GmbH Dornacher Strasse 1 85622 Feldkirchen/Muenchen Germany Sitz der Gesellschaft: Feldkirchen bei Muenchen Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer Registergericht: Muenchen HRB 47456 Ust.-IdNr. VAT Registration No.: DE129385895 Citibank Frankfurt (BLZ 502 109 00) 600119052 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. From alexander.supalov at [hidden] Thu Feb 28 23:10:39 2008 From: alexander.supalov at [hidden] (Supalov, Alexander) Date: Fri, 29 Feb 2008 05:10:39 -0000 Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09 In-Reply-To: Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A201196FE3@swsmsx413.ger.corp.intel.com> Hi, Thanks. I think the main thrust here is the library footprint (no pack/unpack, etc.) and complexity of the user side of the datatype interface, rather than performance. Many applications just don't need any of this, and never will. Why not translating this application non-requirement into a minimum MPI subset? Same with communicator/group management, etc. Moreover, homogeneous installations that dominate HPC now don't actually need any datatype support at all. They send chunks of bytes. This may change in the future, though. A minor performance implication is that without holes that are only possible with derived datatypes, one does not need to track this, split the critical path, and make special provisions inside the MPI device layer to handle iov or such. The NIC capability argument is interesting, but it turns the discussion on its head: we're not after motivating network vendors to provide scatter/gather in hardware here, are we? Please clarify. Best regards. Alexander -----Original Message----- From: mpi3-subsetting-bounces_at_[hidden] [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of Bronis R. de Supinski Sent: Friday, February 29, 2008 5:53 AM To: mpi3-subsetting_at_[hidden] Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09 All: OK, I have to respond to the notion that derived datatypes limit performance. It is just not a reasonable position. Sure, if you can send contiguous locations, you will get higher performance. The problem is that codes do not only need to send contiguous data so that is not an adequate reason to say derived datatypes limit performance. So, what is left? That there is some more efficient way to send non-contiguous data? How? As multiple messages, each of which send contiguous data? If so, then the implementation could do this under the covers and the datatypes are just a convenience for the user not to have to specify the individual sends. OK, suppose that's not the reason. Perhaps the user can do the copying into a contiguous buffer and get better performance? While I have seen this hold with some implementations, it is absurd. There is no reason that I can discern as to why the user should be able to deduce a better copying mechanism than the MPI implementer. So, again, at worst, the datatypes should be a convenience. Do you have an alternative reason or a refutation of these opinions? What is more important, it is certainly possible to build scatter/gather support into a NIC and achieve better performance with datatypes than without. While there are issues to be resolved for that (primarily the issue of pinning memory), they are solvable with the right hardware mechanism. Just because it does not yet exist is not an adequate reason to say "Get rid of datatypes". OK, you are not saying that but you are saying to deprecate them in a sense. And saying you could send contiguous sends more efficiently is a bad argument here. How do datatypes cause inefficiency for that? How much is that cost really? At what point do you hit where the answer is "It would be faster not to compute anything"? Bronis On Fri, 29 Feb 2008, Supalov, Alexander wrote: > Hi, > > Thanks. What subsets inside the current standard would you propose? What > interfaces between them would you envision? > > Good idea about the optimization opportunities. Here's an initial > combined list, with the main benefits as I see them. Please > comment/extend. > > - Dynamic process support: less overhead in the progress engine, easier > global rank handling. > - Heterogeneity: better memory footprint, easier data handling. > - Derived datatypes (especially those with holes): better memory > footprint. > - MPI_ANY_SOURCE: faster, more simple multifabric progress. > - File I/O: smaller requests, easier wait/test functions. > - One-sided ops: no passive target w/o MPI calls - no extra progress > thread. > - Communicator & group management: better memory footprint. > - Message tagging: better support for stable dataflow exchanges, smaller > packets. > - Non-blocking communication: easier ordering, simplified request > handling. > > Best regards. > > Alexander > > -----Original Message----- > From: mpi3-subsetting-bounces_at_[hidden] > [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of > Torsten Hoefler > Sent: Friday, February 29, 2008 5:08 AM > To: mpi3-subsetting_at_[hidden] > Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff telecon > ww09 > > Hi, > > Present: Leonid Meyerguz (Microsoft), Rich Graham (ORNL), Richard > Barrett > > (ORNL), Torsten Hoefler (ISU), Alexander Supalov (Intel) > just for the record, it's "IU" not "ISU" :-) > > > - Scope of the effort > > - Rich > > - Minimum subset consistent with the rest of MPI, for > > performance/memory footprint optimization > > - Danger of splitting MPI, hence against optional features in > the > > standard > I back that (danger of optional features for portability). I'd propose > to split the current standard into mostly self-contained subsets that > have clearly defined interfaces to the rest of the standard. Note: this > only defines logical interfaces, that does *not* define how those things > are to be implemented. This makes it easier to understand the standard > and have separate (portable) libraries for the subsets, it does not > influence optimization possibilities by implementing everything in a > monolithic block (i.e., central progress). > > > - Both blocking & nonblocking belong to the core > > - Torsten > > - Some collectives may go into selectable subsets > I see three subsets: blocking colls, non-blocking colls and topological > colls (maybe also blocking / non-blocking). > > > - MPI_ANY_SOURCE considered harmful > I'd like to add datatypes and heterogeneity to this list (with regards > to performance). Alexander mentioned the dynamics. I think we should > have a lit of items ready that could influence optimization > possibilities significanty if they were to be announced by the user > before he can use them. That would give another strong argument for the > subsetting. > > Best, > Torsten > > -- > bash$ :(){ :|:&};: --------------------- http://www.unixer.de/ ----- > Indiana University | http://www.indiana.edu > Open Systems Lab | http://osl.iu.edu/ > 150 S. Woodlawn Ave. | Bloomington, IN, 474045-7104 | USA > Lindley Hall Room 135 | +01 (812) 855-3608 > _______________________________________________ > Mpi3-subsetting mailing list > Mpi3-subsetting_at_[hidden] > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting > --------------------------------------------------------------------- > Intel GmbH > Dornacher Strasse 1 > 85622 Feldkirchen/Muenchen Germany > Sitz der Gesellschaft: Feldkirchen bei Muenchen > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer > Registergericht: Muenchen HRB 47456 Ust.-IdNr. > VAT Registration No.: DE129385895 > Citibank Frankfurt (BLZ 502 109 00) 600119052 > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > > > _______________________________________________ > Mpi3-subsetting mailing list > Mpi3-subsetting_at_[hidden] > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting > _______________________________________________ Mpi3-subsetting mailing list Mpi3-subsetting_at_[hidden] http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting --------------------------------------------------------------------- Intel GmbH Dornacher Strasse 1 85622 Feldkirchen/Muenchen Germany Sitz der Gesellschaft: Feldkirchen bei Muenchen Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer Registergericht: Muenchen HRB 47456 Ust.-IdNr. VAT Registration No.: DE129385895 Citibank Frankfurt (BLZ 502 109 00) 600119052 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. From alexander.supalov at [hidden] Thu Feb 28 23:17:13 2008 From: alexander.supalov at [hidden] (Supalov, Alexander) Date: Fri, 29 Feb 2008 05:17:13 -0000 Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09 In-Reply-To: Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A201196FE4@swsmsx413.ger.corp.intel.com> Woops... Chunks of bytes may have holes. Discard that argument. Homogeneous installations don't need data transformation, but this is a different matter. -----Original Message----- From: Supalov, Alexander Sent: Friday, February 29, 2008 6:11 AM To: 'Bronis R. de Supinski'; 'mpi3-subsetting_at_[hidden]' Subject: RE: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09 Hi, Thanks. I think the main thrust here is the library footprint (no pack/unpack, etc.) and complexity of the user side of the datatype interface, rather than performance. Many applications just don't need any of this, and never will. Why not translating this application non-requirement into a minimum MPI subset? Same with communicator/group management, etc. Moreover, homogeneous installations that dominate HPC now don't actually need any datatype support at all. They send chunks of bytes. This may change in the future, though. A minor performance implication is that without holes that are only possible with derived datatypes, one does not need to track this, split the critical path, and make special provisions inside the MPI device layer to handle iov or such. The NIC capability argument is interesting, but it turns the discussion on its head: we're not after motivating network vendors to provide scatter/gather in hardware here, are we? Please clarify. Best regards. Alexander -----Original Message----- From: mpi3-subsetting-bounces_at_[hidden] [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of Bronis R. de Supinski Sent: Friday, February 29, 2008 5:53 AM To: mpi3-subsetting_at_[hidden] Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09 All: OK, I have to respond to the notion that derived datatypes limit performance. It is just not a reasonable position. Sure, if you can send contiguous locations, you will get higher performance. The problem is that codes do not only need to send contiguous data so that is not an adequate reason to say derived datatypes limit performance. So, what is left? That there is some more efficient way to send non-contiguous data? How? As multiple messages, each of which send contiguous data? If so, then the implementation could do this under the covers and the datatypes are just a convenience for the user not to have to specify the individual sends. OK, suppose that's not the reason. Perhaps the user can do the copying into a contiguous buffer and get better performance? While I have seen this hold with some implementations, it is absurd. There is no reason that I can discern as to why the user should be able to deduce a better copying mechanism than the MPI implementer. So, again, at worst, the datatypes should be a convenience. Do you have an alternative reason or a refutation of these opinions? What is more important, it is certainly possible to build scatter/gather support into a NIC and achieve better performance with datatypes than without. While there are issues to be resolved for that (primarily the issue of pinning memory), they are solvable with the right hardware mechanism. Just because it does not yet exist is not an adequate reason to say "Get rid of datatypes". OK, you are not saying that but you are saying to deprecate them in a sense. And saying you could send contiguous sends more efficiently is a bad argument here. How do datatypes cause inefficiency for that? How much is that cost really? At what point do you hit where the answer is "It would be faster not to compute anything"? Bronis On Fri, 29 Feb 2008, Supalov, Alexander wrote: > Hi, > > Thanks. What subsets inside the current standard would you propose? What > interfaces between them would you envision? > > Good idea about the optimization opportunities. Here's an initial > combined list, with the main benefits as I see them. Please > comment/extend. > > - Dynamic process support: less overhead in the progress engine, easier > global rank handling. > - Heterogeneity: better memory footprint, easier data handling. > - Derived datatypes (especially those with holes): better memory > footprint. > - MPI_ANY_SOURCE: faster, more simple multifabric progress. > - File I/O: smaller requests, easier wait/test functions. > - One-sided ops: no passive target w/o MPI calls - no extra progress > thread. > - Communicator & group management: better memory footprint. > - Message tagging: better support for stable dataflow exchanges, smaller > packets. > - Non-blocking communication: easier ordering, simplified request > handling. > > Best regards. > > Alexander > > -----Original Message----- > From: mpi3-subsetting-bounces_at_[hidden] > [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of > Torsten Hoefler > Sent: Friday, February 29, 2008 5:08 AM > To: mpi3-subsetting_at_[hidden] > Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff telecon > ww09 > > Hi, > > Present: Leonid Meyerguz (Microsoft), Rich Graham (ORNL), Richard > Barrett > > (ORNL), Torsten Hoefler (ISU), Alexander Supalov (Intel) > just for the record, it's "IU" not "ISU" :-) > > > - Scope of the effort > > - Rich > > - Minimum subset consistent with the rest of MPI, for > > performance/memory footprint optimization > > - Danger of splitting MPI, hence against optional features in > the > > standard > I back that (danger of optional features for portability). I'd propose > to split the current standard into mostly self-contained subsets that > have clearly defined interfaces to the rest of the standard. Note: this > only defines logical interfaces, that does *not* define how those things > are to be implemented. This makes it easier to understand the standard > and have separate (portable) libraries for the subsets, it does not > influence optimization possibilities by implementing everything in a > monolithic block (i.e., central progress). > > > - Both blocking & nonblocking belong to the core > > - Torsten > > - Some collectives may go into selectable subsets > I see three subsets: blocking colls, non-blocking colls and topological > colls (maybe also blocking / non-blocking). > > > - MPI_ANY_SOURCE considered harmful > I'd like to add datatypes and heterogeneity to this list (with regards > to performance). Alexander mentioned the dynamics. I think we should > have a lit of items ready that could influence optimization > possibilities significanty if they were to be announced by the user > before he can use them. That would give another strong argument for the > subsetting. > > Best, > Torsten > > -- > bash$ :(){ :|:&};: --------------------- http://www.unixer.de/ ----- > Indiana University | http://www.indiana.edu > Open Systems Lab | http://osl.iu.edu/ > 150 S. Woodlawn Ave. | Bloomington, IN, 474045-7104 | USA > Lindley Hall Room 135 | +01 (812) 855-3608 > _______________________________________________ > Mpi3-subsetting mailing list > Mpi3-subsetting_at_[hidden] > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting > --------------------------------------------------------------------- > Intel GmbH > Dornacher Strasse 1 > 85622 Feldkirchen/Muenchen Germany > Sitz der Gesellschaft: Feldkirchen bei Muenchen > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer > Registergericht: Muenchen HRB 47456 Ust.-IdNr. > VAT Registration No.: DE129385895 > Citibank Frankfurt (BLZ 502 109 00) 600119052 > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > > > _______________________________________________ > Mpi3-subsetting mailing list > Mpi3-subsetting_at_[hidden] > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting > _______________________________________________ Mpi3-subsetting mailing list Mpi3-subsetting_at_[hidden] http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting --------------------------------------------------------------------- Intel GmbH Dornacher Strasse 1 85622 Feldkirchen/Muenchen Germany Sitz der Gesellschaft: Feldkirchen bei Muenchen Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer Registergericht: Muenchen HRB 47456 Ust.-IdNr. VAT Registration No.: DE129385895 Citibank Frankfurt (BLZ 502 109 00) 600119052 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. From bronis at [hidden] Thu Feb 28 23:19:43 2008 From: bronis at [hidden] (Bronis R. de Supinski) Date: Thu, 28 Feb 2008 21:19:43 -0800 (PST) Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09 In-Reply-To: <5ECAB1304A8B5B4CB3F9D6C01E4E21A201196FE3@swsmsx413.ger.corp.intel.com> Message-ID: Alexander: Most real applications need to send non-contiguous data. If they do not use datatypes then they are doing the equivalent of either the packing/unpacking or smaller messages at the user level. This s hould be discouraged, not encouraged. A small savings in library object size is not ample reason to go against that. And, yes, we are after encouraging hardware vendors to provide the right hardware. Bronis On Fri, 29 Feb 2008, Supalov, Alexander wrote: > Hi, > > Thanks. I think the main thrust here is the library footprint (no > pack/unpack, etc.) and complexity of the user side of the datatype > interface, rather than performance. Many applications just don't need > any of this, and never will. Why not translating this application > non-requirement into a minimum MPI subset? Same with communicator/group > management, etc. > > Moreover, homogeneous installations that dominate HPC now don't actually > need any datatype support at all. They send chunks of bytes. This may > change in the future, though. > > A minor performance implication is that without holes that are only > possible with derived datatypes, one does not need to track this, split > the critical path, and make special provisions inside the MPI device > layer to handle iov or such. > > The NIC capability argument is interesting, but it turns the discussion > on its head: we're not after motivating network vendors to provide > scatter/gather in hardware here, are we? Please clarify. > > Best regards. > > Alexander > > -----Original Message----- > From: mpi3-subsetting-bounces_at_[hidden] > [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of Bronis > R. de Supinski > Sent: Friday, February 29, 2008 5:53 AM > To: mpi3-subsetting_at_[hidden] > Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff telecon > ww09 > > > All: > > OK, I have to respond to the notion that derived datatypes > limit performance. It is just not a reasonable position. > > Sure, if you can send contiguous locations, you will get > higher performance. The problem is that codes do not only > need to send contiguous data so that is not an adequate > reason to say derived datatypes limit performance. > > So, what is left? That there is some more efficient way > to send non-contiguous data? How? As multiple messages, > each of which send contiguous data? If so, then the > implementation could do this under the covers and the > datatypes are just a convenience for the user not to > have to specify the individual sends. OK, suppose that's > not the reason. Perhaps the user can do the copying into > a contiguous buffer and get better performance? While > I have seen this hold with some implementations, it is > absurd. There is no reason that I can discern as to why > the user should be able to deduce a better copying > mechanism than the MPI implementer. So, again, at worst, > the datatypes should be a convenience. Do you have an > alternative reason or a refutation of these opinions? > > What is more important, it is certainly possible to build > scatter/gather support into a NIC and achieve better > performance with datatypes than without. While there are > issues to be resolved for that (primarily the issue of > pinning memory), they are solvable with the right hardware > mechanism. Just because it does not yet exist is not > an adequate reason to say "Get rid of datatypes". OK, > you are not saying that but you are saying to deprecate > them in a sense. And saying you could send contiguous > sends more efficiently is a bad argument here. How do > datatypes cause inefficiency for that? How much is > that cost really? At what point do you hit where the > answer is "It would be faster not to compute anything"? > > Bronis > > > On Fri, 29 Feb 2008, Supalov, Alexander wrote: > > > Hi, > > > > Thanks. What subsets inside the current standard would you propose? > What > > interfaces between them would you envision? > > > > Good idea about the optimization opportunities. Here's an initial > > combined list, with the main benefits as I see them. Please > > comment/extend. > > > > - Dynamic process support: less overhead in the progress engine, > easier > > global rank handling. > > - Heterogeneity: better memory footprint, easier data handling. > > - Derived datatypes (especially those with holes): better memory > > footprint. > > - MPI_ANY_SOURCE: faster, more simple multifabric progress. > > - File I/O: smaller requests, easier wait/test functions. > > - One-sided ops: no passive target w/o MPI calls - no extra progress > > thread. > > - Communicator & group management: better memory footprint. > > - Message tagging: better support for stable dataflow exchanges, > smaller > > packets. > > - Non-blocking communication: easier ordering, simplified request > > handling. > > > > Best regards. > > > > Alexander > > > > -----Original Message----- > > From: mpi3-subsetting-bounces_at_[hidden] > > [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of > > Torsten Hoefler > > Sent: Friday, February 29, 2008 5:08 AM > > To: mpi3-subsetting_at_[hidden] > > Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff telecon > > ww09 > > > > Hi, > > > Present: Leonid Meyerguz (Microsoft), Rich Graham (ORNL), Richard > > Barrett > > > (ORNL), Torsten Hoefler (ISU), Alexander Supalov (Intel) > > just for the record, it's "IU" not "ISU" :-) > > > > > - Scope of the effort > > > - Rich > > > - Minimum subset consistent with the rest of MPI, for > > > performance/memory footprint optimization > > > - Danger of splitting MPI, hence against optional features in > > the > > > standard > > I back that (danger of optional features for portability). I'd propose > > to split the current standard into mostly self-contained subsets that > > have clearly defined interfaces to the rest of the standard. Note: > this > > only defines logical interfaces, that does *not* define how those > things > > are to be implemented. This makes it easier to understand the standard > > and have separate (portable) libraries for the subsets, it does not > > influence optimization possibilities by implementing everything in a > > monolithic block (i.e., central progress). > > > > > - Both blocking & nonblocking belong to the core > > > - Torsten > > > - Some collectives may go into selectable subsets > > I see three subsets: blocking colls, non-blocking colls and > topological > > colls (maybe also blocking / non-blocking). > > > > > - MPI_ANY_SOURCE considered harmful > > I'd like to add datatypes and heterogeneity to this list (with regards > > to performance). Alexander mentioned the dynamics. I think we should > > have a lit of items ready that could influence optimization > > possibilities significanty if they were to be announced by the user > > before he can use them. That would give another strong argument for > the > > subsetting. > > > > Best, > > Torsten > > > > -- > > bash$ :(){ :|:&};: --------------------- http://www.unixer.de/ ----- > > Indiana University | http://www.indiana.edu > > Open Systems Lab | http://osl.iu.edu/ > > 150 S. Woodlawn Ave. | Bloomington, IN, 474045-7104 | USA > > Lindley Hall Room 135 | +01 (812) 855-3608 > > _______________________________________________ > > Mpi3-subsetting mailing list > > Mpi3-subsetting_at_[hidden] > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting > > --------------------------------------------------------------------- > > Intel GmbH > > Dornacher Strasse 1 > > 85622 Feldkirchen/Muenchen Germany > > Sitz der Gesellschaft: Feldkirchen bei Muenchen > > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer > > Registergericht: Muenchen HRB 47456 Ust.-IdNr. > > VAT Registration No.: DE129385895 > > Citibank Frankfurt (BLZ 502 109 00) 600119052 > > > > This e-mail and any attachments may contain confidential material for > > the sole use of the intended recipient(s). Any review or distribution > > by others is strictly prohibited. If you are not the intended > > recipient, please contact the sender and delete all copies. > > > > > > _______________________________________________ > > Mpi3-subsetting mailing list > > Mpi3-subsetting_at_[hidden] > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting > > > _______________________________________________ > Mpi3-subsetting mailing list > Mpi3-subsetting_at_[hidden] > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting > --------------------------------------------------------------------- > Intel GmbH > Dornacher Strasse 1 > 85622 Feldkirchen/Muenchen Germany > Sitz der Gesellschaft: Feldkirchen bei Muenchen > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer > Registergericht: Muenchen HRB 47456 Ust.-IdNr. > VAT Registration No.: DE129385895 > Citibank Frankfurt (BLZ 502 109 00) 600119052 > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > > From alexander.supalov at [hidden] Thu Feb 28 23:38:17 2008 From: alexander.supalov at [hidden] (Supalov, Alexander) Date: Fri, 29 Feb 2008 05:38:17 -0000 Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09 In-Reply-To: Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A201196FE5@swsmsx413.ger.corp.intel.com> Hi, Thanks. I understand your motivation. When you say "most real applications" - what applications do you mean? At least, in what area? For the NIC part, the stress was on "here". In my opinion, subsetting is not about making things more complicated, more challenging to the implementors, or to the underlying hardware. It's about making things simple, easy to use, and easy to implement - including implementation of only those features your users actually need. That the implementation may be faster due to this is an added bonus, not the primary goal. Still, regarding user side copying. Yes, when people do this one wonders why. There's a reason, apart from them: 1) not caring about datatypes and their complexity and 2) not trusting their performance. A modern compiler can rather well optimize a loop with a constant stride, and may have difficulty with an unknown stride. This is why explicit loops are sometimes indeed faster (much faster) in the resulting code than any generic implementation. Best regards. Alexander -----Original Message----- From: Bronis R. de Supinski [mailto:bronis_at_[hidden]] Sent: Friday, February 29, 2008 6:20 AM To: Supalov, Alexander Cc: mpi3-subsetting_at_[hidden] Subject: RE: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09 Alexander: Most real applications need to send non-contiguous data. If they do not use datatypes then they are doing the equivalent of either the packing/unpacking or smaller messages at the user level. This s hould be discouraged, not encouraged. A small savings in library object size is not ample reason to go against that. And, yes, we are after encouraging hardware vendors to provide the right hardware. Bronis On Fri, 29 Feb 2008, Supalov, Alexander wrote: > Hi, > > Thanks. I think the main thrust here is the library footprint (no > pack/unpack, etc.) and complexity of the user side of the datatype > interface, rather than performance. Many applications just don't need > any of this, and never will. Why not translating this application > non-requirement into a minimum MPI subset? Same with communicator/group > management, etc. > > Moreover, homogeneous installations that dominate HPC now don't actually > need any datatype support at all. They send chunks of bytes. This may > change in the future, though. > > A minor performance implication is that without holes that are only > possible with derived datatypes, one does not need to track this, split > the critical path, and make special provisions inside the MPI device > layer to handle iov or such. > > The NIC capability argument is interesting, but it turns the discussion > on its head: we're not after motivating network vendors to provide > scatter/gather in hardware here, are we? Please clarify. > > Best regards. > > Alexander > > -----Original Message----- > From: mpi3-subsetting-bounces_at_[hidden] > [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of Bronis > R. de Supinski > Sent: Friday, February 29, 2008 5:53 AM > To: mpi3-subsetting_at_[hidden] > Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff telecon > ww09 > > > All: > > OK, I have to respond to the notion that derived datatypes > limit performance. It is just not a reasonable position. > > Sure, if you can send contiguous locations, you will get > higher performance. The problem is that codes do not only > need to send contiguous data so that is not an adequate > reason to say derived datatypes limit performance. > > So, what is left? That there is some more efficient way > to send non-contiguous data? How? As multiple messages, > each of which send contiguous data? If so, then the > implementation could do this under the covers and the > datatypes are just a convenience for the user not to > have to specify the individual sends. OK, suppose that's > not the reason. Perhaps the user can do the copying into > a contiguous buffer and get better performance? While > I have seen this hold with some implementations, it is > absurd. There is no reason that I can discern as to why > the user should be able to deduce a better copying > mechanism than the MPI implementer. So, again, at worst, > the datatypes should be a convenience. Do you have an > alternative reason or a refutation of these opinions? > > What is more important, it is certainly possible to build > scatter/gather support into a NIC and achieve better > performance with datatypes than without. While there are > issues to be resolved for that (primarily the issue of > pinning memory), they are solvable with the right hardware > mechanism. Just because it does not yet exist is not > an adequate reason to say "Get rid of datatypes". OK, > you are not saying that but you are saying to deprecate > them in a sense. And saying you could send contiguous > sends more efficiently is a bad argument here. How do > datatypes cause inefficiency for that? How much is > that cost really? At what point do you hit where the > answer is "It would be faster not to compute anything"? > > Bronis > > > On Fri, 29 Feb 2008, Supalov, Alexander wrote: > > > Hi, > > > > Thanks. What subsets inside the current standard would you propose? > What > > interfaces between them would you envision? > > > > Good idea about the optimization opportunities. Here's an initial > > combined list, with the main benefits as I see them. Please > > comment/extend. > > > > - Dynamic process support: less overhead in the progress engine, > easier > > global rank handling. > > - Heterogeneity: better memory footprint, easier data handling. > > - Derived datatypes (especially those with holes): better memory > > footprint. > > - MPI_ANY_SOURCE: faster, more simple multifabric progress. > > - File I/O: smaller requests, easier wait/test functions. > > - One-sided ops: no passive target w/o MPI calls - no extra progress > > thread. > > - Communicator & group management: better memory footprint. > > - Message tagging: better support for stable dataflow exchanges, > smaller > > packets. > > - Non-blocking communication: easier ordering, simplified request > > handling. > > > > Best regards. > > > > Alexander > > > > -----Original Message----- > > From: mpi3-subsetting-bounces_at_[hidden] > > [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of > > Torsten Hoefler > > Sent: Friday, February 29, 2008 5:08 AM > > To: mpi3-subsetting_at_[hidden] > > Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff telecon > > ww09 > > > > Hi, > > > Present: Leonid Meyerguz (Microsoft), Rich Graham (ORNL), Richard > > Barrett > > > (ORNL), Torsten Hoefler (ISU), Alexander Supalov (Intel) > > just for the record, it's "IU" not "ISU" :-) > > > > > - Scope of the effort > > > - Rich > > > - Minimum subset consistent with the rest of MPI, for > > > performance/memory footprint optimization > > > - Danger of splitting MPI, hence against optional features in > > the > > > standard > > I back that (danger of optional features for portability). I'd propose > > to split the current standard into mostly self-contained subsets that > > have clearly defined interfaces to the rest of the standard. Note: > this > > only defines logical interfaces, that does *not* define how those > things > > are to be implemented. This makes it easier to understand the standard > > and have separate (portable) libraries for the subsets, it does not > > influence optimization possibilities by implementing everything in a > > monolithic block (i.e., central progress). > > > > > - Both blocking & nonblocking belong to the core > > > - Torsten > > > - Some collectives may go into selectable subsets > > I see three subsets: blocking colls, non-blocking colls and > topological > > colls (maybe also blocking / non-blocking). > > > > > - MPI_ANY_SOURCE considered harmful > > I'd like to add datatypes and heterogeneity to this list (with regards > > to performance). Alexander mentioned the dynamics. I think we should > > have a lit of items ready that could influence optimization > > possibilities significanty if they were to be announced by the user > > before he can use them. That would give another strong argument for > the > > subsetting. > > > > Best, > > Torsten > > > > -- > > bash$ :(){ :|:&};: --------------------- http://www.unixer.de/ ----- > > Indiana University | http://www.indiana.edu > > Open Systems Lab | http://osl.iu.edu/ > > 150 S. Woodlawn Ave. | Bloomington, IN, 474045-7104 | USA > > Lindley Hall Room 135 | +01 (812) 855-3608 > > _______________________________________________ > > Mpi3-subsetting mailing list > > Mpi3-subsetting_at_[hidden] > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting > > --------------------------------------------------------------------- > > Intel GmbH > > Dornacher Strasse 1 > > 85622 Feldkirchen/Muenchen Germany > > Sitz der Gesellschaft: Feldkirchen bei Muenchen > > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer > > Registergericht: Muenchen HRB 47456 Ust.-IdNr. > > VAT Registration No.: DE129385895 > > Citibank Frankfurt (BLZ 502 109 00) 600119052 > > > > This e-mail and any attachments may contain confidential material for > > the sole use of the intended recipient(s). Any review or distribution > > by others is strictly prohibited. If you are not the intended > > recipient, please contact the sender and delete all copies. > > > > > > _______________________________________________ > > Mpi3-subsetting mailing list > > Mpi3-subsetting_at_[hidden] > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting > > > _______________________________________________ > Mpi3-subsetting mailing list > Mpi3-subsetting_at_[hidden] > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting > --------------------------------------------------------------------- > Intel GmbH > Dornacher Strasse 1 > 85622 Feldkirchen/Muenchen Germany > Sitz der Gesellschaft: Feldkirchen bei Muenchen > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer > Registergericht: Muenchen HRB 47456 Ust.-IdNr. > VAT Registration No.: DE129385895 > Citibank Frankfurt (BLZ 502 109 00) 600119052 > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > > --------------------------------------------------------------------- Intel GmbH Dornacher Strasse 1 85622 Feldkirchen/Muenchen Germany Sitz der Gesellschaft: Feldkirchen bei Muenchen Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer Registergericht: Muenchen HRB 47456 Ust.-IdNr. VAT Registration No.: DE129385895 Citibank Frankfurt (BLZ 502 109 00) 600119052 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. From bronis at [hidden] Fri Feb 29 03:11:27 2008 From: bronis at [hidden] (Bronis R. de Supinski) Date: Fri, 29 Feb 2008 01:11:27 -0800 (PST) Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09 In-Reply-To: <5ECAB1304A8B5B4CB3F9D6C01E4E21A201196FE5@swsmsx413.ger.corp.intel.com> Message-ID: Alexander: Re: > Thanks. I understand your motivation. When you say "most real > applications" - what applications do you mean? At least, in what area? ? Scientific computing... > For the NIC part, the stress was on "here". In my opinion, subsetting is > not about making things more complicated, more challenging to the > implementors, or to the underlying hardware. It's about making things > simple, easy to use, and easy to implement - including implementation of > only those features your users actually need. That the implementation > may be faster due to this is an added bonus, not the primary goal. The emphasis here should not be on creating a disincentive for vendors to do the right thing... > Still, regarding user side copying. Yes, when people do this one wonders > why. There's a reason, apart from them: 1) not caring about datatypes > and their complexity and 2) not trusting their performance. A modern > compiler can rather well optimize a loop with a constant stride, and may > have difficulty with an unknown stride. This is why explicit loops are > sometimes indeed faster (much faster) in the resulting code than any > generic implementation. Huh? What makes you think the user copying code is in terms of constant stride? Generally, it varies with the input. We are not talking about a simple situation to optimize at the user level... Bronis > > Best regards. > > Alexander > > -----Original Message----- > From: Bronis R. de Supinski [mailto:bronis_at_[hidden]] > Sent: Friday, February 29, 2008 6:20 AM > To: Supalov, Alexander > Cc: mpi3-subsetting_at_[hidden] > Subject: RE: [Mpi3-subsetting] agenda for subsetting kickoff telecon > ww09 > > > Alexander: > > Most real applications need to send non-contiguous > data. If they do not use datatypes then they are > doing the equivalent of either the packing/unpacking > or smaller messages at the user level. This s hould > be discouraged, not encouraged. A small savings > in library object size is not ample reason to go > against that. And, yes, we are after encouraging > hardware vendors to provide the right hardware. > > Bronis > > > On Fri, 29 Feb 2008, Supalov, Alexander wrote: > > > Hi, > > > > Thanks. I think the main thrust here is the library footprint (no > > pack/unpack, etc.) and complexity of the user side of the datatype > > interface, rather than performance. Many applications just don't need > > any of this, and never will. Why not translating this application > > non-requirement into a minimum MPI subset? Same with > communicator/group > > management, etc. > > > > Moreover, homogeneous installations that dominate HPC now don't > actually > > need any datatype support at all. They send chunks of bytes. This may > > change in the future, though. > > > > A minor performance implication is that without holes that are only > > possible with derived datatypes, one does not need to track this, > split > > the critical path, and make special provisions inside the MPI device > > layer to handle iov or such. > > > > The NIC capability argument is interesting, but it turns the > discussion > > on its head: we're not after motivating network vendors to provide > > scatter/gather in hardware here, are we? Please clarify. > > > > Best regards. > > > > Alexander > > > > -----Original Message----- > > From: mpi3-subsetting-bounces_at_[hidden] > > [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of > Bronis > > R. de Supinski > > Sent: Friday, February 29, 2008 5:53 AM > > To: mpi3-subsetting_at_[hidden] > > Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff telecon > > ww09 > > > > > > All: > > > > OK, I have to respond to the notion that derived datatypes > > limit performance. It is just not a reasonable position. > > > > Sure, if you can send contiguous locations, you will get > > higher performance. The problem is that codes do not only > > need to send contiguous data so that is not an adequate > > reason to say derived datatypes limit performance. > > > > So, what is left? That there is some more efficient way > > to send non-contiguous data? How? As multiple messages, > > each of which send contiguous data? If so, then the > > implementation could do this under the covers and the > > datatypes are just a convenience for the user not to > > have to specify the individual sends. OK, suppose that's > > not the reason. Perhaps the user can do the copying into > > a contiguous buffer and get better performance? While > > I have seen this hold with some implementations, it is > > absurd. There is no reason that I can discern as to why > > the user should be able to deduce a better copying > > mechanism than the MPI implementer. So, again, at worst, > > the datatypes should be a convenience. Do you have an > > alternative reason or a refutation of these opinions? > > > > What is more important, it is certainly possible to build > > scatter/gather support into a NIC and achieve better > > performance with datatypes than without. While there are > > issues to be resolved for that (primarily the issue of > > pinning memory), they are solvable with the right hardware > > mechanism. Just because it does not yet exist is not > > an adequate reason to say "Get rid of datatypes". OK, > > you are not saying that but you are saying to deprecate > > them in a sense. And saying you could send contiguous > > sends more efficiently is a bad argument here. How do > > datatypes cause inefficiency for that? How much is > > that cost really? At what point do you hit where the > > answer is "It would be faster not to compute anything"? > > > > Bronis > > > > > > On Fri, 29 Feb 2008, Supalov, Alexander wrote: > > > > > Hi, > > > > > > Thanks. What subsets inside the current standard would you propose? > > What > > > interfaces between them would you envision? > > > > > > Good idea about the optimization opportunities. Here's an initial > > > combined list, with the main benefits as I see them. Please > > > comment/extend. > > > > > > - Dynamic process support: less overhead in the progress engine, > > easier > > > global rank handling. > > > - Heterogeneity: better memory footprint, easier data handling. > > > - Derived datatypes (especially those with holes): better memory > > > footprint. > > > - MPI_ANY_SOURCE: faster, more simple multifabric progress. > > > - File I/O: smaller requests, easier wait/test functions. > > > - One-sided ops: no passive target w/o MPI calls - no extra progress > > > thread. > > > - Communicator & group management: better memory footprint. > > > - Message tagging: better support for stable dataflow exchanges, > > smaller > > > packets. > > > - Non-blocking communication: easier ordering, simplified request > > > handling. > > > > > > Best regards. > > > > > > Alexander > > > > > > -----Original Message----- > > > From: mpi3-subsetting-bounces_at_[hidden] > > > [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of > > > Torsten Hoefler > > > Sent: Friday, February 29, 2008 5:08 AM > > > To: mpi3-subsetting_at_[hidden] > > > Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff telecon > > > ww09 > > > > > > Hi, > > > > Present: Leonid Meyerguz (Microsoft), Rich Graham (ORNL), > Richard > > > Barrett > > > > (ORNL), Torsten Hoefler (ISU), Alexander Supalov (Intel) > > > just for the record, it's "IU" not "ISU" :-) > > > > > > > - Scope of the effort > > > > - Rich > > > > - Minimum subset consistent with the rest of MPI, for > > > > performance/memory footprint optimization > > > > - Danger of splitting MPI, hence against optional features > in > > > the > > > > standard > > > I back that (danger of optional features for portability). I'd > propose > > > to split the current standard into mostly self-contained subsets > that > > > have clearly defined interfaces to the rest of the standard. Note: > > this > > > only defines logical interfaces, that does *not* define how those > > things > > > are to be implemented. This makes it easier to understand the > standard > > > and have separate (portable) libraries for the subsets, it does not > > > influence optimization possibilities by implementing everything in a > > > monolithic block (i.e., central progress). > > > > > > > - Both blocking & nonblocking belong to the core > > > > - Torsten > > > > - Some collectives may go into selectable subsets > > > I see three subsets: blocking colls, non-blocking colls and > > topological > > > colls (maybe also blocking / non-blocking). > > > > > > > - MPI_ANY_SOURCE considered harmful > > > I'd like to add datatypes and heterogeneity to this list (with > regards > > > to performance). Alexander mentioned the dynamics. I think we should > > > have a lit of items ready that could influence optimization > > > possibilities significanty if they were to be announced by the user > > > before he can use them. That would give another strong argument for > > the > > > subsetting. > > > > > > Best, > > > Torsten > > > > > > -- > > > bash$ :(){ :|:&};: --------------------- http://www.unixer.de/ > ----- > > > Indiana University | http://www.indiana.edu > > > Open Systems Lab | http://osl.iu.edu/ > > > 150 S. Woodlawn Ave. | Bloomington, IN, 474045-7104 | USA > > > Lindley Hall Room 135 | +01 (812) 855-3608 > > > _______________________________________________ > > > Mpi3-subsetting mailing list > > > Mpi3-subsetting_at_[hidden] > > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting > > > > --------------------------------------------------------------------- > > > Intel GmbH > > > Dornacher Strasse 1 > > > 85622 Feldkirchen/Muenchen Germany > > > Sitz der Gesellschaft: Feldkirchen bei Muenchen > > > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer > > > Registergericht: Muenchen HRB 47456 Ust.-IdNr. > > > VAT Registration No.: DE129385895 > > > Citibank Frankfurt (BLZ 502 109 00) 600119052 > > > > > > This e-mail and any attachments may contain confidential material > for > > > the sole use of the intended recipient(s). Any review or > distribution > > > by others is strictly prohibited. If you are not the intended > > > recipient, please contact the sender and delete all copies. > > > > > > > > > _______________________________________________ > > > Mpi3-subsetting mailing list > > > Mpi3-subsetting_at_[hidden] > > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting > > > > > _______________________________________________ > > Mpi3-subsetting mailing list > > Mpi3-subsetting_at_[hidden] > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting > > --------------------------------------------------------------------- > > Intel GmbH > > Dornacher Strasse 1 > > 85622 Feldkirchen/Muenchen Germany > > Sitz der Gesellschaft: Feldkirchen bei Muenchen > > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer > > Registergericht: Muenchen HRB 47456 Ust.-IdNr. > > VAT Registration No.: DE129385895 > > Citibank Frankfurt (BLZ 502 109 00) 600119052 > > > > This e-mail and any attachments may contain confidential material for > > the sole use of the intended recipient(s). Any review or distribution > > by others is strictly prohibited. If you are not the intended > > recipient, please contact the sender and delete all copies. > > > > > --------------------------------------------------------------------- > Intel GmbH > Dornacher Strasse 1 > 85622 Feldkirchen/Muenchen Germany > Sitz der Gesellschaft: Feldkirchen bei Muenchen > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer > Registergericht: Muenchen HRB 47456 Ust.-IdNr. > VAT Registration No.: DE129385895 > Citibank Frankfurt (BLZ 502 109 00) 600119052 > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > > From alexander.supalov at [hidden] Fri Feb 29 05:39:40 2008 From: alexander.supalov at [hidden] (Supalov, Alexander) Date: Fri, 29 Feb 2008 11:39:40 -0000 Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09 In-Reply-To: Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A20119726C@swsmsx413.ger.corp.intel.com> Dear Bronis, Thanks. What scientific computing codes do you mean here - chemistry, structural mechanics, fluid dynamics, genomics, something else? Or do you speak generally of any code that needs sparse data structures? If so, what's your estimate of the relative number of such codes compared to those that do not need sparse datatypes? In what domain? The right doze of vendor motivation not to do wrong things is a good point, I'll consider it. Finally, the constant stride copying was but an example when inlining may help to users achieve higher performance. There may be other examples known in the scientific computing area. However, since performance is not primary goal for datatypes, I suggest we let this matter rest for a while. Best regards. Alexander -----Original Message----- From: Bronis R. de Supinski [mailto:bronis_at_[hidden]] Sent: Friday, February 29, 2008 10:11 AM To: Supalov, Alexander Cc: mpi3-subsetting_at_[hidden] Subject: RE: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09 Alexander: Re: > Thanks. I understand your motivation. When you say "most real > applications" - what applications do you mean? At least, in what area? ? Scientific computing... > For the NIC part, the stress was on "here". In my opinion, subsetting is > not about making things more complicated, more challenging to the > implementors, or to the underlying hardware. It's about making things > simple, easy to use, and easy to implement - including implementation of > only those features your users actually need. That the implementation > may be faster due to this is an added bonus, not the primary goal. The emphasis here should not be on creating a disincentive for vendors to do the right thing... > Still, regarding user side copying. Yes, when people do this one wonders > why. There's a reason, apart from them: 1) not caring about datatypes > and their complexity and 2) not trusting their performance. A modern > compiler can rather well optimize a loop with a constant stride, and may > have difficulty with an unknown stride. This is why explicit loops are > sometimes indeed faster (much faster) in the resulting code than any > generic implementation. Huh? What makes you think the user copying code is in terms of constant stride? Generally, it varies with the input. We are not talking about a simple situation to optimize at the user level... Bronis > > Best regards. > > Alexander > > -----Original Message----- > From: Bronis R. de Supinski [mailto:bronis_at_[hidden]] > Sent: Friday, February 29, 2008 6:20 AM > To: Supalov, Alexander > Cc: mpi3-subsetting_at_[hidden] > Subject: RE: [Mpi3-subsetting] agenda for subsetting kickoff telecon > ww09 > > > Alexander: > > Most real applications need to send non-contiguous > data. If they do not use datatypes then they are > doing the equivalent of either the packing/unpacking > or smaller messages at the user level. This s hould > be discouraged, not encouraged. A small savings > in library object size is not ample reason to go > against that. And, yes, we are after encouraging > hardware vendors to provide the right hardware. > > Bronis > > > On Fri, 29 Feb 2008, Supalov, Alexander wrote: > > > Hi, > > > > Thanks. I think the main thrust here is the library footprint (no > > pack/unpack, etc.) and complexity of the user side of the datatype > > interface, rather than performance. Many applications just don't need > > any of this, and never will. Why not translating this application > > non-requirement into a minimum MPI subset? Same with > communicator/group > > management, etc. > > > > Moreover, homogeneous installations that dominate HPC now don't > actually > > need any datatype support at all. They send chunks of bytes. This may > > change in the future, though. > > > > A minor performance implication is that without holes that are only > > possible with derived datatypes, one does not need to track this, > split > > the critical path, and make special provisions inside the MPI device > > layer to handle iov or such. > > > > The NIC capability argument is interesting, but it turns the > discussion > > on its head: we're not after motivating network vendors to provide > > scatter/gather in hardware here, are we? Please clarify. > > > > Best regards. > > > > Alexander > > > > -----Original Message----- > > From: mpi3-subsetting-bounces_at_[hidden] > > [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of > Bronis > > R. de Supinski > > Sent: Friday, February 29, 2008 5:53 AM > > To: mpi3-subsetting_at_[hidden] > > Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff telecon > > ww09 > > > > > > All: > > > > OK, I have to respond to the notion that derived datatypes > > limit performance. It is just not a reasonable position. > > > > Sure, if you can send contiguous locations, you will get > > higher performance. The problem is that codes do not only > > need to send contiguous data so that is not an adequate > > reason to say derived datatypes limit performance. > > > > So, what is left? That there is some more efficient way > > to send non-contiguous data? How? As multiple messages, > > each of which send contiguous data? If so, then the > > implementation could do this under the covers and the > > datatypes are just a convenience for the user not to > > have to specify the individual sends. OK, suppose that's > > not the reason. Perhaps the user can do the copying into > > a contiguous buffer and get better performance? While > > I have seen this hold with some implementations, it is > > absurd. There is no reason that I can discern as to why > > the user should be able to deduce a better copying > > mechanism than the MPI implementer. So, again, at worst, > > the datatypes should be a convenience. Do you have an > > alternative reason or a refutation of these opinions? > > > > What is more important, it is certainly possible to build > > scatter/gather support into a NIC and achieve better > > performance with datatypes than without. While there are > > issues to be resolved for that (primarily the issue of > > pinning memory), they are solvable with the right hardware > > mechanism. Just because it does not yet exist is not > > an adequate reason to say "Get rid of datatypes". OK, > > you are not saying that but you are saying to deprecate > > them in a sense. And saying you could send contiguous > > sends more efficiently is a bad argument here. How do > > datatypes cause inefficiency for that? How much is > > that cost really? At what point do you hit where the > > answer is "It would be faster not to compute anything"? > > > > Bronis > > > > > > On Fri, 29 Feb 2008, Supalov, Alexander wrote: > > > > > Hi, > > > > > > Thanks. What subsets inside the current standard would you propose? > > What > > > interfaces between them would you envision? > > > > > > Good idea about the optimization opportunities. Here's an initial > > > combined list, with the main benefits as I see them. Please > > > comment/extend. > > > > > > - Dynamic process support: less overhead in the progress engine, > > easier > > > global rank handling. > > > - Heterogeneity: better memory footprint, easier data handling. > > > - Derived datatypes (especially those with holes): better memory > > > footprint. > > > - MPI_ANY_SOURCE: faster, more simple multifabric progress. > > > - File I/O: smaller requests, easier wait/test functions. > > > - One-sided ops: no passive target w/o MPI calls - no extra progress > > > thread. > > > - Communicator & group management: better memory footprint. > > > - Message tagging: better support for stable dataflow exchanges, > > smaller > > > packets. > > > - Non-blocking communication: easier ordering, simplified request > > > handling. > > > > > > Best regards. > > > > > > Alexander > > > > > > -----Original Message----- > > > From: mpi3-subsetting-bounces_at_[hidden] > > > [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of > > > Torsten Hoefler > > > Sent: Friday, February 29, 2008 5:08 AM > > > To: mpi3-subsetting_at_[hidden] > > > Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff telecon > > > ww09 > > > > > > Hi, > > > > Present: Leonid Meyerguz (Microsoft), Rich Graham (ORNL), > Richard > > > Barrett > > > > (ORNL), Torsten Hoefler (ISU), Alexander Supalov (Intel) > > > just for the record, it's "IU" not "ISU" :-) > > > > > > > - Scope of the effort > > > > - Rich > > > > - Minimum subset consistent with the rest of MPI, for > > > > performance/memory footprint optimization > > > > - Danger of splitting MPI, hence against optional features > in > > > the > > > > standard > > > I back that (danger of optional features for portability). I'd > propose > > > to split the current standard into mostly self-contained subsets > that > > > have clearly defined interfaces to the rest of the standard. Note: > > this > > > only defines logical interfaces, that does *not* define how those > > things > > > are to be implemented. This makes it easier to understand the > standard > > > and have separate (portable) libraries for the subsets, it does not > > > influence optimization possibilities by implementing everything in a > > > monolithic block (i.e., central progress). > > > > > > > - Both blocking & nonblocking belong to the core > > > > - Torsten > > > > - Some collectives may go into selectable subsets > > > I see three subsets: blocking colls, non-blocking colls and > > topological > > > colls (maybe also blocking / non-blocking). > > > > > > > - MPI_ANY_SOURCE considered harmful > > > I'd like to add datatypes and heterogeneity to this list (with > regards > > > to performance). Alexander mentioned the dynamics. I think we should > > > have a lit of items ready that could influence optimization > > > possibilities significanty if they were to be announced by the user > > > before he can use them. That would give another strong argument for > > the > > > subsetting. > > > > > > Best, > > > Torsten > > > > > > -- > > > bash$ :(){ :|:&};: --------------------- http://www.unixer.de/ > ----- > > > Indiana University | http://www.indiana.edu > > > Open Systems Lab | http://osl.iu.edu/ > > > 150 S. Woodlawn Ave. | Bloomington, IN, 474045-7104 | USA > > > Lindley Hall Room 135 | +01 (812) 855-3608 > > > _______________________________________________ > > > Mpi3-subsetting mailing list > > > Mpi3-subsetting_at_[hidden] > > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting > > > > --------------------------------------------------------------------- > > > Intel GmbH > > > Dornacher Strasse 1 > > > 85622 Feldkirchen/Muenchen Germany > > > Sitz der Gesellschaft: Feldkirchen bei Muenchen > > > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer > > > Registergericht: Muenchen HRB 47456 Ust.-IdNr. > > > VAT Registration No.: DE129385895 > > > Citibank Frankfurt (BLZ 502 109 00) 600119052 > > > > > > This e-mail and any attachments may contain confidential material > for > > > the sole use of the intended recipient(s). Any review or > distribution > > > by others is strictly prohibited. If you are not the intended > > > recipient, please contact the sender and delete all copies. > > > > > > > > > _______________________________________________ > > > Mpi3-subsetting mailing list > > > Mpi3-subsetting_at_[hidden] > > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting > > > > > _______________________________________________ > > Mpi3-subsetting mailing list > > Mpi3-subsetting_at_[hidden] > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting > > --------------------------------------------------------------------- > > Intel GmbH > > Dornacher Strasse 1 > > 85622 Feldkirchen/Muenchen Germany > > Sitz der Gesellschaft: Feldkirchen bei Muenchen > > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer > > Registergericht: Muenchen HRB 47456 Ust.-IdNr. > > VAT Registration No.: DE129385895 > > Citibank Frankfurt (BLZ 502 109 00) 600119052 > > > > This e-mail and any attachments may contain confidential material for > > the sole use of the intended recipient(s). Any review or distribution > > by others is strictly prohibited. If you are not the intended > > recipient, please contact the sender and delete all copies. > > > > > --------------------------------------------------------------------- > Intel GmbH > Dornacher Strasse 1 > 85622 Feldkirchen/Muenchen Germany > Sitz der Gesellschaft: Feldkirchen bei Muenchen > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer > Registergericht: Muenchen HRB 47456 Ust.-IdNr. > VAT Registration No.: DE129385895 > Citibank Frankfurt (BLZ 502 109 00) 600119052 > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > > --------------------------------------------------------------------- Intel GmbH Dornacher Strasse 1 85622 Feldkirchen/Muenchen Germany Sitz der Gesellschaft: Feldkirchen bei Muenchen Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer Registergericht: Muenchen HRB 47456 Ust.-IdNr. VAT Registration No.: DE129385895 Citibank Frankfurt (BLZ 502 109 00) 600119052 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. From bronis at [hidden] Fri Feb 29 07:17:19 2008 From: bronis at [hidden] (Bronis R. de Supinski) Date: Fri, 29 Feb 2008 05:17:19 -0800 (PST) Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09 In-Reply-To: <5ECAB1304A8B5B4CB3F9D6C01E4E21A20119726C@swsmsx413.ger.corp.intel.com> Message-ID: Alexander: It is the vast majority of scientific applications. It is not just ones that need sparse data structures. A stencil application that uses dense matrices has strided non-contiguous data transfers for half (2D) or more (3D or more complex stencils) of its communication. Non-contiguous communication is the reality of distributed memory computing... I am fine with letting this rest but my point is that an emphasis on performance by implementers should be the case for datatypes... Bronis On Fri, 29 Feb 2008, Supalov, Alexander wrote: > Dear Bronis, > > Thanks. What scientific computing codes do you mean here - chemistry, > structural mechanics, fluid dynamics, genomics, something else? Or do > you speak generally of any code that needs sparse data structures? If > so, what's your estimate of the relative number of such codes compared > to those that do not need sparse datatypes? In what domain? > > The right doze of vendor motivation not to do wrong things is a good > point, I'll consider it. > > Finally, the constant stride copying was but an example when inlining > may help to users achieve higher performance. There may be other > examples known in the scientific computing area. However, since > performance is not primary goal for datatypes, I suggest we let this > matter rest for a while. > > Best regards. > > Alexander > > -----Original Message----- > From: Bronis R. de Supinski [mailto:bronis_at_[hidden]] > Sent: Friday, February 29, 2008 10:11 AM > To: Supalov, Alexander > Cc: mpi3-subsetting_at_[hidden] > Subject: RE: [Mpi3-subsetting] agenda for subsetting kickoff telecon > ww09 > > > Alexander: > > Re: > > Thanks. I understand your motivation. When you say "most real > > applications" - what applications do you mean? At least, in what area? > > ? Scientific computing... > > > For the NIC part, the stress was on "here". In my opinion, subsetting > is > > not about making things more complicated, more challenging to the > > implementors, or to the underlying hardware. It's about making things > > simple, easy to use, and easy to implement - including implementation > of > > only those features your users actually need. That the implementation > > may be faster due to this is an added bonus, not the primary goal. > > The emphasis here should not be on creating a disincentive > for vendors to do the right thing... > > > Still, regarding user side copying. Yes, when people do this one > wonders > > why. There's a reason, apart from them: 1) not caring about datatypes > > and their complexity and 2) not trusting their performance. A modern > > compiler can rather well optimize a loop with a constant stride, and > may > > have difficulty with an unknown stride. This is why explicit loops are > > sometimes indeed faster (much faster) in the resulting code than any > > generic implementation. > > Huh? What makes you think the user copying code is > in terms of constant stride? Generally, it varies > with the input. We are not talking about a simple > situation to optimize at the user level... > > Bronis > > > > > > Best regards. > > > > Alexander > > > > -----Original Message----- > > From: Bronis R. de Supinski [mailto:bronis_at_[hidden]] > > Sent: Friday, February 29, 2008 6:20 AM > > To: Supalov, Alexander > > Cc: mpi3-subsetting_at_[hidden] > > Subject: RE: [Mpi3-subsetting] agenda for subsetting kickoff telecon > > ww09 > > > > > > Alexander: > > > > Most real applications need to send non-contiguous > > data. If they do not use datatypes then they are > > doing the equivalent of either the packing/unpacking > > or smaller messages at the user level. This s hould > > be discouraged, not encouraged. A small savings > > in library object size is not ample reason to go > > against that. And, yes, we are after encouraging > > hardware vendors to provide the right hardware. > > > > Bronis > > > > > > On Fri, 29 Feb 2008, Supalov, Alexander wrote: > > > > > Hi, > > > > > > Thanks. I think the main thrust here is the library footprint (no > > > pack/unpack, etc.) and complexity of the user side of the datatype > > > interface, rather than performance. Many applications just don't > need > > > any of this, and never will. Why not translating this application > > > non-requirement into a minimum MPI subset? Same with > > communicator/group > > > management, etc. > > > > > > Moreover, homogeneous installations that dominate HPC now don't > > actually > > > need any datatype support at all. They send chunks of bytes. This > may > > > change in the future, though. > > > > > > A minor performance implication is that without holes that are only > > > possible with derived datatypes, one does not need to track this, > > split > > > the critical path, and make special provisions inside the MPI device > > > layer to handle iov or such. > > > > > > The NIC capability argument is interesting, but it turns the > > discussion > > > on its head: we're not after motivating network vendors to provide > > > scatter/gather in hardware here, are we? Please clarify. > > > > > > Best regards. > > > > > > Alexander > > > > > > -----Original Message----- > > > From: mpi3-subsetting-bounces_at_[hidden] > > > [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of > > Bronis > > > R. de Supinski > > > Sent: Friday, February 29, 2008 5:53 AM > > > To: mpi3-subsetting_at_[hidden] > > > Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff telecon > > > ww09 > > > > > > > > > All: > > > > > > OK, I have to respond to the notion that derived datatypes > > > limit performance. It is just not a reasonable position. > > > > > > Sure, if you can send contiguous locations, you will get > > > higher performance. The problem is that codes do not only > > > need to send contiguous data so that is not an adequate > > > reason to say derived datatypes limit performance. > > > > > > So, what is left? That there is some more efficient way > > > to send non-contiguous data? How? As multiple messages, > > > each of which send contiguous data? If so, then the > > > implementation could do this under the covers and the > > > datatypes are just a convenience for the user not to > > > have to specify the individual sends. OK, suppose that's > > > not the reason. Perhaps the user can do the copying into > > > a contiguous buffer and get better performance? While > > > I have seen this hold with some implementations, it is > > > absurd. There is no reason that I can discern as to why > > > the user should be able to deduce a better copying > > > mechanism than the MPI implementer. So, again, at worst, > > > the datatypes should be a convenience. Do you have an > > > alternative reason or a refutation of these opinions? > > > > > > What is more important, it is certainly possible to build > > > scatter/gather support into a NIC and achieve better > > > performance with datatypes than without. While there are > > > issues to be resolved for that (primarily the issue of > > > pinning memory), they are solvable with the right hardware > > > mechanism. Just because it does not yet exist is not > > > an adequate reason to say "Get rid of datatypes". OK, > > > you are not saying that but you are saying to deprecate > > > them in a sense. And saying you could send contiguous > > > sends more efficiently is a bad argument here. How do > > > datatypes cause inefficiency for that? How much is > > > that cost really? At what point do you hit where the > > > answer is "It would be faster not to compute anything"? > > > > > > Bronis > > > > > > > > > On Fri, 29 Feb 2008, Supalov, Alexander wrote: > > > > > > > Hi, > > > > > > > > Thanks. What subsets inside the current standard would you > propose? > > > What > > > > interfaces between them would you envision? > > > > > > > > Good idea about the optimization opportunities. Here's an initial > > > > combined list, with the main benefits as I see them. Please > > > > comment/extend. > > > > > > > > - Dynamic process support: less overhead in the progress engine, > > > easier > > > > global rank handling. > > > > - Heterogeneity: better memory footprint, easier data handling. > > > > - Derived datatypes (especially those with holes): better memory > > > > footprint. > > > > - MPI_ANY_SOURCE: faster, more simple multifabric progress. > > > > - File I/O: smaller requests, easier wait/test functions. > > > > - One-sided ops: no passive target w/o MPI calls - no extra > progress > > > > thread. > > > > - Communicator & group management: better memory footprint. > > > > - Message tagging: better support for stable dataflow exchanges, > > > smaller > > > > packets. > > > > - Non-blocking communication: easier ordering, simplified request > > > > handling. > > > > > > > > Best regards. > > > > > > > > Alexander > > > > > > > > -----Original Message----- > > > > From: mpi3-subsetting-bounces_at_[hidden] > > > > [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of > > > > Torsten Hoefler > > > > Sent: Friday, February 29, 2008 5:08 AM > > > > To: mpi3-subsetting_at_[hidden] > > > > Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff > telecon > > > > ww09 > > > > > > > > Hi, > > > > > Present: Leonid Meyerguz (Microsoft), Rich Graham (ORNL), > > Richard > > > > Barrett > > > > > (ORNL), Torsten Hoefler (ISU), Alexander Supalov (Intel) > > > > just for the record, it's "IU" not "ISU" :-) > > > > > > > > > - Scope of the effort > > > > > - Rich > > > > > - Minimum subset consistent with the rest of MPI, for > > > > > performance/memory footprint optimization > > > > > - Danger of splitting MPI, hence against optional > features > > in > > > > the > > > > > standard > > > > I back that (danger of optional features for portability). I'd > > propose > > > > to split the current standard into mostly self-contained subsets > > that > > > > have clearly defined interfaces to the rest of the standard. Note: > > > this > > > > only defines logical interfaces, that does *not* define how those > > > things > > > > are to be implemented. This makes it easier to understand the > > standard > > > > and have separate (portable) libraries for the subsets, it does > not > > > > influence optimization possibilities by implementing everything in > a > > > > monolithic block (i.e., central progress). > > > > > > > > > - Both blocking & nonblocking belong to the core > > > > > - Torsten > > > > > - Some collectives may go into selectable subsets > > > > I see three subsets: blocking colls, non-blocking colls and > > > topological > > > > colls (maybe also blocking / non-blocking). > > > > > > > > > - MPI_ANY_SOURCE considered harmful > > > > I'd like to add datatypes and heterogeneity to this list (with > > regards > > > > to performance). Alexander mentioned the dynamics. I think we > should > > > > have a lit of items ready that could influence optimization > > > > possibilities significanty if they were to be announced by the > user > > > > before he can use them. That would give another strong argument > for > > > the > > > > subsetting. > > > > > > > > Best, > > > > Torsten > > > > > > > > -- > > > > bash$ :(){ :|:&};: --------------------- http://www.unixer.de/ > > ----- > > > > Indiana University | http://www.indiana.edu > > > > Open Systems Lab | http://osl.iu.edu/ > > > > 150 S. Woodlawn Ave. | Bloomington, IN, 474045-7104 | USA > > > > Lindley Hall Room 135 | +01 (812) 855-3608 > > > > _______________________________________________ > > > > Mpi3-subsetting mailing list > > > > Mpi3-subsetting_at_[hidden] > > > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting > > > > > > --------------------------------------------------------------------- > > > > Intel GmbH > > > > Dornacher Strasse 1 > > > > 85622 Feldkirchen/Muenchen Germany > > > > Sitz der Gesellschaft: Feldkirchen bei Muenchen > > > > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes > Schwaderer > > > > Registergericht: Muenchen HRB 47456 Ust.-IdNr. > > > > VAT Registration No.: DE129385895 > > > > Citibank Frankfurt (BLZ 502 109 00) 600119052 > > > > > > > > This e-mail and any attachments may contain confidential material > > for > > > > the sole use of the intended recipient(s). Any review or > > distribution > > > > by others is strictly prohibited. If you are not the intended > > > > recipient, please contact the sender and delete all copies. > > > > > > > > > > > > _______________________________________________ > > > > Mpi3-subsetting mailing list > > > > Mpi3-subsetting_at_[hidden] > > > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting > > > > > > > _______________________________________________ > > > Mpi3-subsetting mailing list > > > Mpi3-subsetting_at_[hidden] > > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting > > > > --------------------------------------------------------------------- > > > Intel GmbH > > > Dornacher Strasse 1 > > > 85622 Feldkirchen/Muenchen Germany > > > Sitz der Gesellschaft: Feldkirchen bei Muenchen > > > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer > > > Registergericht: Muenchen HRB 47456 Ust.-IdNr. > > > VAT Registration No.: DE129385895 > > > Citibank Frankfurt (BLZ 502 109 00) 600119052 > > > > > > This e-mail and any attachments may contain confidential material > for > > > the sole use of the intended recipient(s). Any review or > distribution > > > by others is strictly prohibited. If you are not the intended > > > recipient, please contact the sender and delete all copies. > > > > > > > > --------------------------------------------------------------------- > > Intel GmbH > > Dornacher Strasse 1 > > 85622 Feldkirchen/Muenchen Germany > > Sitz der Gesellschaft: Feldkirchen bei Muenchen > > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer > > Registergericht: Muenchen HRB 47456 Ust.-IdNr. > > VAT Registration No.: DE129385895 > > Citibank Frankfurt (BLZ 502 109 00) 600119052 > > > > This e-mail and any attachments may contain confidential material for > > the sole use of the intended recipient(s). Any review or distribution > > by others is strictly prohibited. If you are not the intended > > recipient, please contact the sender and delete all copies. > > > > > --------------------------------------------------------------------- > Intel GmbH > Dornacher Strasse 1 > 85622 Feldkirchen/Muenchen Germany > Sitz der Gesellschaft: Feldkirchen bei Muenchen > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer > Registergericht: Muenchen HRB 47456 Ust.-IdNr. > VAT Registration No.: DE129385895 > Citibank Frankfurt (BLZ 502 109 00) 600119052 > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > > From alexander.supalov at [hidden] Fri Feb 29 07:38:48 2008 From: alexander.supalov at [hidden] (Supalov, Alexander) Date: Fri, 29 Feb 2008 13:38:48 -0000 Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09 In-Reply-To: Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A20119737F@swsmsx413.ger.corp.intel.com> OK, thanks. -----Original Message----- From: Bronis R. de Supinski [mailto:bronis_at_[hidden]] Sent: Friday, February 29, 2008 2:17 PM To: Supalov, Alexander Cc: mpi3-subsetting_at_[hidden] Subject: RE: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09 Alexander: It is the vast majority of scientific applications. It is not just ones that need sparse data structures. A stencil application that uses dense matrices has strided non-contiguous data transfers for half (2D) or more (3D or more complex stencils) of its communication. Non-contiguous communication is the reality of distributed memory computing... I am fine with letting this rest but my point is that an emphasis on performance by implementers should be the case for datatypes... Bronis On Fri, 29 Feb 2008, Supalov, Alexander wrote: > Dear Bronis, > > Thanks. What scientific computing codes do you mean here - chemistry, > structural mechanics, fluid dynamics, genomics, something else? Or do > you speak generally of any code that needs sparse data structures? If > so, what's your estimate of the relative number of such codes compared > to those that do not need sparse datatypes? In what domain? > > The right doze of vendor motivation not to do wrong things is a good > point, I'll consider it. > > Finally, the constant stride copying was but an example when inlining > may help to users achieve higher performance. There may be other > examples known in the scientific computing area. However, since > performance is not primary goal for datatypes, I suggest we let this > matter rest for a while. > > Best regards. > > Alexander > > -----Original Message----- > From: Bronis R. de Supinski [mailto:bronis_at_[hidden]] > Sent: Friday, February 29, 2008 10:11 AM > To: Supalov, Alexander > Cc: mpi3-subsetting_at_[hidden] > Subject: RE: [Mpi3-subsetting] agenda for subsetting kickoff telecon > ww09 > > > Alexander: > > Re: > > Thanks. I understand your motivation. When you say "most real > > applications" - what applications do you mean? At least, in what area? > > ? Scientific computing... > > > For the NIC part, the stress was on "here". In my opinion, subsetting > is > > not about making things more complicated, more challenging to the > > implementors, or to the underlying hardware. It's about making things > > simple, easy to use, and easy to implement - including implementation > of > > only those features your users actually need. That the implementation > > may be faster due to this is an added bonus, not the primary goal. > > The emphasis here should not be on creating a disincentive > for vendors to do the right thing... > > > Still, regarding user side copying. Yes, when people do this one > wonders > > why. There's a reason, apart from them: 1) not caring about datatypes > > and their complexity and 2) not trusting their performance. A modern > > compiler can rather well optimize a loop with a constant stride, and > may > > have difficulty with an unknown stride. This is why explicit loops are > > sometimes indeed faster (much faster) in the resulting code than any > > generic implementation. > > Huh? What makes you think the user copying code is > in terms of constant stride? Generally, it varies > with the input. We are not talking about a simple > situation to optimize at the user level... > > Bronis > > > > > > Best regards. > > > > Alexander > > > > -----Original Message----- > > From: Bronis R. de Supinski [mailto:bronis_at_[hidden]] > > Sent: Friday, February 29, 2008 6:20 AM > > To: Supalov, Alexander > > Cc: mpi3-subsetting_at_[hidden] > > Subject: RE: [Mpi3-subsetting] agenda for subsetting kickoff telecon > > ww09 > > > > > > Alexander: > > > > Most real applications need to send non-contiguous > > data. If they do not use datatypes then they are > > doing the equivalent of either the packing/unpacking > > or smaller messages at the user level. This s hould > > be discouraged, not encouraged. A small savings > > in library object size is not ample reason to go > > against that. And, yes, we are after encouraging > > hardware vendors to provide the right hardware. > > > > Bronis > > > > > > On Fri, 29 Feb 2008, Supalov, Alexander wrote: > > > > > Hi, > > > > > > Thanks. I think the main thrust here is the library footprint (no > > > pack/unpack, etc.) and complexity of the user side of the datatype > > > interface, rather than performance. Many applications just don't > need > > > any of this, and never will. Why not translating this application > > > non-requirement into a minimum MPI subset? Same with > > communicator/group > > > management, etc. > > > > > > Moreover, homogeneous installations that dominate HPC now don't > > actually > > > need any datatype support at all. They send chunks of bytes. This > may > > > change in the future, though. > > > > > > A minor performance implication is that without holes that are only > > > possible with derived datatypes, one does not need to track this, > > split > > > the critical path, and make special provisions inside the MPI device > > > layer to handle iov or such. > > > > > > The NIC capability argument is interesting, but it turns the > > discussion > > > on its head: we're not after motivating network vendors to provide > > > scatter/gather in hardware here, are we? Please clarify. > > > > > > Best regards. > > > > > > Alexander > > > > > > -----Original Message----- > > > From: mpi3-subsetting-bounces_at_[hidden] > > > [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of > > Bronis > > > R. de Supinski > > > Sent: Friday, February 29, 2008 5:53 AM > > > To: mpi3-subsetting_at_[hidden] > > > Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff telecon > > > ww09 > > > > > > > > > All: > > > > > > OK, I have to respond to the notion that derived datatypes > > > limit performance. It is just not a reasonable position. > > > > > > Sure, if you can send contiguous locations, you will get > > > higher performance. The problem is that codes do not only > > > need to send contiguous data so that is not an adequate > > > reason to say derived datatypes limit performance. > > > > > > So, what is left? That there is some more efficient way > > > to send non-contiguous data? How? As multiple messages, > > > each of which send contiguous data? If so, then the > > > implementation could do this under the covers and the > > > datatypes are just a convenience for the user not to > > > have to specify the individual sends. OK, suppose that's > > > not the reason. Perhaps the user can do the copying into > > > a contiguous buffer and get better performance? While > > > I have seen this hold with some implementations, it is > > > absurd. There is no reason that I can discern as to why > > > the user should be able to deduce a better copying > > > mechanism than the MPI implementer. So, again, at worst, > > > the datatypes should be a convenience. Do you have an > > > alternative reason or a refutation of these opinions? > > > > > > What is more important, it is certainly possible to build > > > scatter/gather support into a NIC and achieve better > > > performance with datatypes than without. While there are > > > issues to be resolved for that (primarily the issue of > > > pinning memory), they are solvable with the right hardware > > > mechanism. Just because it does not yet exist is not > > > an adequate reason to say "Get rid of datatypes". OK, > > > you are not saying that but you are saying to deprecate > > > them in a sense. And saying you could send contiguous > > > sends more efficiently is a bad argument here. How do > > > datatypes cause inefficiency for that? How much is > > > that cost really? At what point do you hit where the > > > answer is "It would be faster not to compute anything"? > > > > > > Bronis > > > > > > > > > On Fri, 29 Feb 2008, Supalov, Alexander wrote: > > > > > > > Hi, > > > > > > > > Thanks. What subsets inside the current standard would you > propose? > > > What > > > > interfaces between them would you envision? > > > > > > > > Good idea about the optimization opportunities. Here's an initial > > > > combined list, with the main benefits as I see them. Please > > > > comment/extend. > > > > > > > > - Dynamic process support: less overhead in the progress engine, > > > easier > > > > global rank handling. > > > > - Heterogeneity: better memory footprint, easier data handling. > > > > - Derived datatypes (especially those with holes): better memory > > > > footprint. > > > > - MPI_ANY_SOURCE: faster, more simple multifabric progress. > > > > - File I/O: smaller requests, easier wait/test functions. > > > > - One-sided ops: no passive target w/o MPI calls - no extra > progress > > > > thread. > > > > - Communicator & group management: better memory footprint. > > > > - Message tagging: better support for stable dataflow exchanges, > > > smaller > > > > packets. > > > > - Non-blocking communication: easier ordering, simplified request > > > > handling. > > > > > > > > Best regards. > > > > > > > > Alexander > > > > > > > > -----Original Message----- > > > > From: mpi3-subsetting-bounces_at_[hidden] > > > > [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of > > > > Torsten Hoefler > > > > Sent: Friday, February 29, 2008 5:08 AM > > > > To: mpi3-subsetting_at_[hidden] > > > > Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff > telecon > > > > ww09 > > > > > > > > Hi, > > > > > Present: Leonid Meyerguz (Microsoft), Rich Graham (ORNL), > > Richard > > > > Barrett > > > > > (ORNL), Torsten Hoefler (ISU), Alexander Supalov (Intel) > > > > just for the record, it's "IU" not "ISU" :-) > > > > > > > > > - Scope of the effort > > > > > - Rich > > > > > - Minimum subset consistent with the rest of MPI, for > > > > > performance/memory footprint optimization > > > > > - Danger of splitting MPI, hence against optional > features > > in > > > > the > > > > > standard > > > > I back that (danger of optional features for portability). I'd > > propose > > > > to split the current standard into mostly self-contained subsets > > that > > > > have clearly defined interfaces to the rest of the standard. Note: > > > this > > > > only defines logical interfaces, that does *not* define how those > > > things > > > > are to be implemented. This makes it easier to understand the > > standard > > > > and have separate (portable) libraries for the subsets, it does > not > > > > influence optimization possibilities by implementing everything in > a > > > > monolithic block (i.e., central progress). > > > > > > > > > - Both blocking & nonblocking belong to the core > > > > > - Torsten > > > > > - Some collectives may go into selectable subsets > > > > I see three subsets: blocking colls, non-blocking colls and > > > topological > > > > colls (maybe also blocking / non-blocking). > > > > > > > > > - MPI_ANY_SOURCE considered harmful > > > > I'd like to add datatypes and heterogeneity to this list (with > > regards > > > > to performance). Alexander mentioned the dynamics. I think we > should > > > > have a lit of items ready that could influence optimization > > > > possibilities significanty if they were to be announced by the > user > > > > before he can use them. That would give another strong argument > for > > > the > > > > subsetting. > > > > > > > > Best, > > > > Torsten > > > > > > > > -- > > > > bash$ :(){ :|:&};: --------------------- http://www.unixer.de/ > > ----- > > > > Indiana University | http://www.indiana.edu > > > > Open Systems Lab | http://osl.iu.edu/ > > > > 150 S. Woodlawn Ave. | Bloomington, IN, 474045-7104 | USA > > > > Lindley Hall Room 135 | +01 (812) 855-3608 > > > > _______________________________________________ > > > > Mpi3-subsetting mailing list > > > > Mpi3-subsetting_at_[hidden] > > > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting > > > > > > --------------------------------------------------------------------- > > > > Intel GmbH > > > > Dornacher Strasse 1 > > > > 85622 Feldkirchen/Muenchen Germany > > > > Sitz der Gesellschaft: Feldkirchen bei Muenchen > > > > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes > Schwaderer > > > > Registergericht: Muenchen HRB 47456 Ust.-IdNr. > > > > VAT Registration No.: DE129385895 > > > > Citibank Frankfurt (BLZ 502 109 00) 600119052 > > > > > > > > This e-mail and any attachments may contain confidential material > > for > > > > the sole use of the intended recipient(s). Any review or > > distribution > > > > by others is strictly prohibited. If you are not the intended > > > > recipient, please contact the sender and delete all copies. > > > > > > > > > > > > _______________________________________________ > > > > Mpi3-subsetting mailing list > > > > Mpi3-subsetting_at_[hidden] > > > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting > > > > > > > _______________________________________________ > > > Mpi3-subsetting mailing list > > > Mpi3-subsetting_at_[hidden] > > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting > > > > --------------------------------------------------------------------- > > > Intel GmbH > > > Dornacher Strasse 1 > > > 85622 Feldkirchen/Muenchen Germany > > > Sitz der Gesellschaft: Feldkirchen bei Muenchen > > > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer > > > Registergericht: Muenchen HRB 47456 Ust.-IdNr. > > > VAT Registration No.: DE129385895 > > > Citibank Frankfurt (BLZ 502 109 00) 600119052 > > > > > > This e-mail and any attachments may contain confidential material > for > > > the sole use of the intended recipient(s). Any review or > distribution > > > by others is strictly prohibited. If you are not the intended > > > recipient, please contact the sender and delete all copies. > > > > > > > > --------------------------------------------------------------------- > > Intel GmbH > > Dornacher Strasse 1 > > 85622 Feldkirchen/Muenchen Germany > > Sitz der Gesellschaft: Feldkirchen bei Muenchen > > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer > > Registergericht: Muenchen HRB 47456 Ust.-IdNr. > > VAT Registration No.: DE129385895 > > Citibank Frankfurt (BLZ 502 109 00) 600119052 > > > > This e-mail and any attachments may contain confidential material for > > the sole use of the intended recipient(s). Any review or distribution > > by others is strictly prohibited. If you are not the intended > > recipient, please contact the sender and delete all copies. > > > > > --------------------------------------------------------------------- > Intel GmbH > Dornacher Strasse 1 > 85622 Feldkirchen/Muenchen Germany > Sitz der Gesellschaft: Feldkirchen bei Muenchen > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer > Registergericht: Muenchen HRB 47456 Ust.-IdNr. > VAT Registration No.: DE129385895 > Citibank Frankfurt (BLZ 502 109 00) 600119052 > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > > --------------------------------------------------------------------- Intel GmbH Dornacher Strasse 1 85622 Feldkirchen/Muenchen Germany Sitz der Gesellschaft: Feldkirchen bei Muenchen Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer Registergericht: Muenchen HRB 47456 Ust.-IdNr. VAT Registration No.: DE129385895 Citibank Frankfurt (BLZ 502 109 00) 600119052 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. From rbarrett at [hidden] Fri Feb 29 07:50:03 2008 From: rbarrett at [hidden] (Richard Barrett) Date: Fri, 29 Feb 2008 08:50:03 -0500 Subject: [Mpi3-subsetting] Some "stupid user" questions, comments. In-Reply-To: Message-ID: Hi folks, I'm still sorting things out in my mind, so perhaps this note is just me talking to myself. But should you feel so compelled to sort through it, I would appreciate any feedback you might offer; and it will make me a more informed participant. I see two main perspectives: the user and the implementer. I come from the user side, so I feel comfortable in positing that user confusion over the size of the standard is really a function of presentation. That is, most of us get our information regarding using MPI directly from the standard. For me, this is the _only_ standard I've ever actually read! Perhaps I am missing out on thousands of C and Fortran capabilities, but sometimes ignorance is bliss. That speaks highly to the MPI specification presentation; however it need not be the case. An easy solution to the "too many routines" complaint is a tutorial/book/chapter on the basics, with pointers to further information. And in fact these books exist. That said, I hope that MPI-3 deprecates a meaningful volume of functionality. >From the implementer perspective, there appear to be two goals. First is to ease the burden with regard to the amount of functionality that must be supported. (And we users don't want to hear of your whining, esp. from a company the size of Intel :) Second, which overlaps with user concerns, is performance. That is, by defining a small subset of functionality, strong performance (in some sense, e.g. speed or memory requirements) can be realized. At the risk of starting too detailed a discussion at this early point (as well as exposing my ignorance:), I will throw out a few situations for discussion. 1. What would such a subset would imply with regard to what I view as support functionality, such as user-defined datatypes, topologies, etc? Ie could this support be easily provided, say by cutting-and-pasting from the full implementation you will still provide? (I now see Torsten recommends excluding datatypes, but what of other stuff?) 2. Even more broadly (and perhaps very ignorantly), can I simply link in both libraries, like -lmpi_subset -lmpi, getting the good stuff from the former and the excluded functionality from the latter? In addition to the application developers use of MPI, all large application programs I¹ve dealt with make some use of externally produced libraries (a ³very good thing² imo), which probably exceed the functionality in a ³subset² implementation. 3. I (basically) understand the adverse performance effects of allowing promiscuous receives (MPI_ANY_SOURCE). However, this is a powerful capability for many codes, and used only in moderation, eg for setting up communication requirements (such as communication partners in unstructured, semi-structured, and dynamic mesh computations). In this case the sender knows its partner, but the receiver does not. A reduction(sum) is used to let each process know the number of communication partners from which it will receive data, the process posts that many promiscuous receives, which when satisfied lets it from then on specify the sender. So would it be possible to include this capability in a separate function, say the blocking send/recv, but not allow it in the non-blocking version? 4. Collectives: I can't name a code I've ever worked with that doesn't require MPI_Allreduce (though I wouldn¹t be surprised to hear of many), and this in a broad set of science areas. MPI_Bcast is also often used (but quite often only in the setup phase). I see MPI_Reduce used most often to collect timing information, so MPI_Allreduce would probably be fine as well. MPI_Gather is often quite useful, as is MPI_Scatter, but again often in setup. (Though often ³setup² occurs once per time step.) Non-constant size versions are often used. And others can also no doubt offer strong opinions regarding inclusion of exclusion. But from an implementation perspective, what are the issues? In particular, is the basic infrastructure for these (and other collective operations) the same? A driving premise for supporting collectives is that the sort of performance driven capability under discussion is most needed by applications running at very large scale, which is where even very good collect implementations run into problems. 5. Language bindings and perhaps other things: With the expectation/hope that full implementations continue to be available, I could use them for code development, thus making use of things like type checking, etc. And does this latter use then imply the need for "stubs" for things like the (vaporous) Fortran bindings module, communicators (if only MPI_COMM_WORLD is supported), etc.? And presuming the answer to #2 is ³no², could/should the full implementation ³warn² me (preferably at compile time) when I¹m using functionality that rules out use of the subset? 6. Will the profile layer still be supported? Generating usage can still be quantified using a full implementation, but performance would not be (at least in this manner), which would rule out an apples-to-apples comparison between a full implementation and the subset version with its advertised superior performance. (Of course an overall runtime could be compared, which is the final word, but a more detailed analysis is often preferred.) 7. If blocking and non-blocking are required of the subset, aren't these blocking semantics? MPI_Send: MPI_Isend ( ..., &req ); MPI_Wait ( ..., &req ); ----- MPI_Recv: MPI_Irecv ( ..., &req ); MPI_Wait ( &req ); - And speaking of this, are there performance issues associated with variants of MPI_Wait, eg MPI_Waitany, MPI_Waitsome? Finally, I¹ll officially register my concern with what I see as an increasing complexity in this effort, esp wrt ³multiple subsets². I don¹t intend this comment to suppress ideas, but to help keep the beating the drum for simplicity, which I see as a key goal of this effort. If you read this far, thanks! My apologies if some of these issues have been previously covered. And if I've simply exposed myself as ignorant, I feel confident is stating that I am not alone - these questions will persist from others. :) Richard -- Richard Barrett Future Technologies Group, Computer Science and Mathematics Division, and Scientific Computing Group, National Center for Computational Science Oak Ridge National Laboratory http://ft.ornl.gov/~rbarrett On 2/28/08 1:04 PM, "mpi3-subsetting-request_at_[hidden]" wrote: > Thank you for your time today. It was a very good discussion. Here's > what I captured (please add/modify what I may have missed): > > Present: Leonid Meyerguz (Microsoft), Rich Graham (ORNL), Richard > Barrett (ORNL), Torsten Hoefler (ISU), Alexander Supalov (Intel) > > - Opens & introductions > > - Scope of the effort > - Rich > - Minimum subset consistent with the rest of MPI, for > performance/memory footprint optimization > - Danger of splitting MPI, hence against optional features in the > standard > - Both blocking & nonblocking belong to the core > - Torsten > - Some collectives may go into selectable subsets > - MPI_ANY_SOURCE considered harmful > - Leonid > - Flexible support for optional features, means for choosing and > advertising level of compliance/set of features > - See enclosed email for Alexander's POV > > - General discussion snapshots > - Support of subsets: some or all? If some, possible linkage problems > in static apps (or dead calls). If all, where's the gain? > - Optional: really optional (may be not present) or selectable (are > present but may be unused)? > - Performance penalty for unused subsets: implementation matter or > standard choice? > - Portability may be limited to certain class of applications (think > FT, master-slave runs) > - All we design needs to be implementable, complexity needs to be > controlled > - An ability to use certain set of subsets should not preclude pulling > in other modules if necessary > - Whatever we do, it should not conflict with the ABI efforts > - Need to stay nice and be nicer wrt to the libraries (think > threading) and keep things simple > - The simplification argument, if put first, may not be liked by some > > - Next steps > - Please comment on these minutes, and add/modify what I may have > missed > - I'll prepare a couple of slides by next week summarizing our > discussion so far; again, your feedback will be most welcome > - At the meeting, it may be great to meet F2F briefly and discuss any > eventual loose ends before the presentation at the Forum; I'll see to > this > > Best regards. > > Alexander > > -- > Dr Alexander Supalov > Intel GmbH > Hermuelheimer Strasse 8a > 50321 Bruehl, Germany > Phone: +49 2232 209034 > Mobile: +49 173 511 8735 > Fax: +49 2232 209029 > --------------------------------------------------------------------- > Intel GmbH > Dornacher Strasse 1 > 85622 Feldkirchen/Muenchen Germany > Sitz der Gesellschaft: Feldkirchen bei Muenchen > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer > Registergericht: Muenchen HRB 47456 Ust.-IdNr. > VAT Registration No.: DE129385895 > Citibank Frankfurt (BLZ 502 109 00) 600119052 > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > -------------- next part -------------- > HTML attachment scrubbed and removed > -------------- next part -------------- > An embedded message was scrubbed... > From: "Supalov, Alexander" > Subject: Subsetting scope: a POV > Date: Tue, 26 Feb 2008 11:10:15 -0000 > Size: 17674 > Url: > http://lists.mpi-forum.org/MailArchives/mpi3-subsetting/attachments/20080228/6 > 73bb604/attachment.mht > > ------------------------------ > > _______________________________________________ > Mpi3-subsetting mailing list > Mpi3-subsetting_at_[hidden] > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting > > > End of Mpi3-subsetting Digest, Vol 1, Issue 5 > ********************************************* * -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.supalov at [hidden] Fri Feb 29 08:26:10 2008 From: alexander.supalov at [hidden] (Supalov, Alexander) Date: Fri, 29 Feb 2008 14:26:10 -0000 Subject: [Mpi3-subsetting] Some "stupid user" questions, comments. In-Reply-To: Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A201197401@swsmsx413.ger.corp.intel.com> Dear RIchard, Thanks. The more complicated the standard gets, the happier are the implementors. However, now we try to think like MPI users for a change, so, thanks for providing a reality check. Now, to one of your questions. An MPI_ANY_SOURCE MPI_Recv in multifabric environment means that a receive has to be posted somehow to more than one fabric in the MPI device layer. Once one of them gets the message, the posted receives should be cancelled on other fabrics. Now, what if they've already matched and started to receive something? What if they cannot cancel a posted receive? And so on. There are 3 to 5 ways to deal with this situation, with and without actually posting a receive, but none of them is good enough if you ask me. That's why there are 3 to 5 of them, actually. And all of them complicate the progress engine - the heart of an MPI implementation - at exactly the spot where one wants things simple and fast. This means that most of the time we fight these repercussions and curse the MPI_ANY_SOURCE. Or, looping back to the beginning of this message, we actually never stop blessing MPI_ANY_SOURCE. Fighting this kind of trouble is what we are paid for. ;) Best regards. Alexander ________________________________ From: mpi3-subsetting-bounces_at_[hidden] [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of Richard Barrett Sent: Friday, February 29, 2008 2:50 PM To: mpi3-subsetting_at_[hidden] Subject: [Mpi3-subsetting] Some "stupid user" questions, comments. Hi folks, I'm still sorting things out in my mind, so perhaps this note is just me talking to myself. But should you feel so compelled to sort through it, I would appreciate any feedback you might offer; and it will make me a more informed participant. I see two main perspectives: the user and the implementer. I come from the user side, so I feel comfortable in positing that user confusion over the size of the standard is really a function of presentation. That is, most of us get our information regarding using MPI directly from the standard. For me, this is the _only_ standard I've ever actually read! Perhaps I am missing out on thousands of C and Fortran capabilities, but sometimes ignorance is bliss. That speaks highly to the MPI specification presentation; however it need not be the case. An easy solution to the "too many routines" complaint is a tutorial/book/chapter on the basics, with pointers to further information. And in fact these books exist. That said, I hope that MPI-3 deprecates a meaningful volume of functionality. >From the implementer perspective, there appear to be two goals. First is to ease the burden with regard to the amount of functionality that must be supported. (And we users don't want to hear of your whining, esp. from a company the size of Intel :) Second, which overlaps with user concerns, is performance. That is, by defining a small subset of functionality, strong performance (in some sense, e.g. speed or memory requirements) can be realized. At the risk of starting too detailed a discussion at this early point (as well as exposing my ignorance:), I will throw out a few situations for discussion. 1. What would such a subset would imply with regard to what I view as support functionality, such as user-defined datatypes, topologies, etc? Ie could this support be easily provided, say by cutting-and-pasting from the full implementation you will still provide? (I now see Torsten recommends excluding datatypes, but what of other stuff?) 2. Even more broadly (and perhaps very ignorantly), can I simply link in both libraries, like -lmpi_subset -lmpi, getting the good stuff from the former and the excluded functionality from the latter? In addition to the application developers use of MPI, all large application programs I've dealt with make some use of externally produced libraries (a "very good thing" imo), which probably exceed the functionality in a "subset" implementation. 3. I (basically) understand the adverse performance effects of allowing promiscuous receives (MPI_ANY_SOURCE). However, this is a powerful capability for many codes, and used only in moderation, eg for setting up communication requirements (such as communication partners in unstructured, semi-structured, and dynamic mesh computations). In this case the sender knows its partner, but the receiver does not. A reduction(sum) is used to let each process know the number of communication partners from which it will receive data, the process posts that many promiscuous receives, which when satisfied lets it from then on specify the sender. So would it be possible to include this capability in a separate function, say the blocking send/recv, but not allow it in the non-blocking version? 4. Collectives: I can't name a code I've ever worked with that doesn't require MPI_Allreduce (though I wouldn't be surprised to hear of many), and this in a broad set of science areas. MPI_Bcast is also often used (but quite often only in the setup phase). I see MPI_Reduce used most often to collect timing information, so MPI_Allreduce would probably be fine as well. MPI_Gather is often quite useful, as is MPI_Scatter, but again often in setup. (Though often "setup" occurs once per time step.) Non-constant size versions are often used. And others can also no doubt offer strong opinions regarding inclusion of exclusion. But from an implementation perspective, what are the issues? In particular, is the basic infrastructure for these (and other collective operations) the same? A driving premise for supporting collectives is that the sort of performance driven capability under discussion is most needed by applications running at very large scale, which is where even very good collect implementations run into problems. 5. Language bindings and perhaps other things: With the expectation/hope that full implementations continue to be available, I could use them for code development, thus making use of things like type checking, etc. And does this latter use then imply the need for "stubs" for things like the (vaporous) Fortran bindings module, communicators (if only MPI_COMM_WORLD is supported), etc.? And presuming the answer to #2 is "no", could/should the full implementation "warn" me (preferably at compile time) when I'm using functionality that rules out use of the subset? 6. Will the profile layer still be supported? Generating usage can still be quantified using a full implementation, but performance would not be (at least in this manner), which would rule out an apples-to-apples comparison between a full implementation and the subset version with its advertised superior performance. (Of course an overall runtime could be compared, which is the final word, but a more detailed analysis is often preferred.) 7. If blocking and non-blocking are required of the subset, aren't these blocking semantics? MPI_Send: MPI_Isend ( ..., &req ); MPI_Wait ( ..., &req ); ----- MPI_Recv: MPI_Irecv ( ..., &req ); MPI_Wait ( &req ); - And speaking of this, are there performance issues associated with variants of MPI_Wait, eg MPI_Waitany, MPI_Waitsome? Finally, I'll officially register my concern with what I see as an increasing complexity in this effort, esp wrt "multiple subsets". I don't intend this comment to suppress ideas, but to help keep the beating the drum for simplicity, which I see as a key goal of this effort. If you read this far, thanks! My apologies if some of these issues have been previously covered. And if I've simply exposed myself as ignorant, I feel confident is stating that I am not alone - these questions will persist from others. :) Richard -- Richard Barrett Future Technologies Group, Computer Science and Mathematics Division, and Scientific Computing Group, National Center for Computational Science Oak Ridge National Laboratory http://ft.ornl.gov/~rbarrett On 2/28/08 1:04 PM, "mpi3-subsetting-request_at_[hidden]" wrote: > Thank you for your time today. It was a very good discussion. Here's > what I captured (please add/modify what I may have missed): > > Present: Leonid Meyerguz (Microsoft), Rich Graham (ORNL), Richard > Barrett (ORNL), Torsten Hoefler (ISU), Alexander Supalov (Intel) > > - Opens & introductions > > - Scope of the effort > - Rich > - Minimum subset consistent with the rest of MPI, for > performance/memory footprint optimization > - Danger of splitting MPI, hence against optional features in the > standard > - Both blocking & nonblocking belong to the core > - Torsten > - Some collectives may go into selectable subsets > - MPI_ANY_SOURCE considered harmful > - Leonid > - Flexible support for optional features, means for choosing and > advertising level of compliance/set of features > - See enclosed email for Alexander's POV > > - General discussion snapshots > - Support of subsets: some or all? If some, possible linkage problems > in static apps (or dead calls). If all, where's the gain? > - Optional: really optional (may be not present) or selectable (are > present but may be unused)? > - Performance penalty for unused subsets: implementation matter or > standard choice? > - Portability may be limited to certain class of applications (think > FT, master-slave runs) > - All we design needs to be implementable, complexity needs to be > controlled > - An ability to use certain set of subsets should not preclude pulling > in other modules if necessary > - Whatever we do, it should not conflict with the ABI efforts > - Need to stay nice and be nicer wrt to the libraries (think > threading) and keep things simple > - The simplification argument, if put first, may not be liked by some > > - Next steps > - Please comment on these minutes, and add/modify what I may have > missed > - I'll prepare a couple of slides by next week summarizing our > discussion so far; again, your feedback will be most welcome > - At the meeting, it may be great to meet F2F briefly and discuss any > eventual loose ends before the presentation at the Forum; I'll see to > this > > Best regards. > > Alexander > > -- > Dr Alexander Supalov > Intel GmbH > Hermuelheimer Strasse 8a > 50321 Bruehl, Germany > Phone: +49 2232 209034 > Mobile: +49 173 511 8735 > Fax: +49 2232 209029 > --------------------------------------------------------------------- > Intel GmbH > Dornacher Strasse 1 > 85622 Feldkirchen/Muenchen Germany > Sitz der Gesellschaft: Feldkirchen bei Muenchen > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer > Registergericht: Muenchen HRB 47456 Ust.-IdNr. > VAT Registration No.: DE129385895 > Citibank Frankfurt (BLZ 502 109 00) 600119052 > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > -------------- next part -------------- > HTML attachment scrubbed and removed > -------------- next part -------------- > An embedded message was scrubbed... > From: "Supalov, Alexander" > Subject: Subsetting scope: a POV > Date: Tue, 26 Feb 2008 11:10:15 -0000 > Size: 17674 > Url: > http://lists.mpi-forum.org/MailArchives/mpi3-subsetting/attachments/2008 0228/6 > 73bb604/attachment.mht > > ------------------------------ > > _______________________________________________ > Mpi3-subsetting mailing list > Mpi3-subsetting_at_[hidden] > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting > > > End of Mpi3-subsetting Digest, Vol 1, Issue 5 > ********************************************* --------------------------------------------------------------------- Intel GmbH Dornacher Strasse 1 85622 Feldkirchen/Muenchen Germany Sitz der Gesellschaft: Feldkirchen bei Muenchen Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer Registergericht: Muenchen HRB 47456 Ust.-IdNr. VAT Registration No.: DE129385895 Citibank Frankfurt (BLZ 502 109 00) 600119052 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. * -------------- next part -------------- An HTML attachment was scrubbed... URL: From htor at [hidden] Fri Feb 29 08:57:26 2008 From: htor at [hidden] (Torsten Hoefler) Date: Fri, 29 Feb 2008 09:57:26 -0500 Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09 In-Reply-To: <5ECAB1304A8B5B4CB3F9D6C01E4E21A201196FE2@swsmsx413.ger.corp.intel.com> Message-ID: <20080229145726.GJ16623@benten.cs.indiana.edu> Hi, > Thanks. As soon as there's a couple of non-blocking recvs out there, > waiting for them in reverse order requires tracking of the moment when > the receives were posted. In some cases this leads to extra fields and > data exchanges. How's that different from multiple Recvs in multiple threads. Hmm, I guess the threaded case is just undefined and the implementation is allowed to ignore ordering, right? > The footprint argument generally says that the library will be smaller. > This may be a minor matter for general purpose computers, but as soon as > you go to Petascale, you need every byte on the compute nodes for user > data, especially if dynamic libraries are not supported. ack > As for the collectives, many are implemented using SendRecv, and that > blocking call in turn often uses non-blocking communication. Classic > Alltoallv algorithm uses nonblocking calls, too. So, I'm not sure that > even unoptimized blocking collectives will always use only blocking > pt2pt. Yes, of course - again, the proposed "interface" is more of a logical nature. You can implement all blocking collectives with blocking p2p (it'll be slower). But ok, pulling all p2p functions in this interface is also unproblematic. So I don't really have a srong opinion here. Best, Torsten -- bash$ :(){ :|:&};: --------------------- http://www.unixer.de/ ----- Indiana University | http://www.indiana.edu Open Systems Lab | http://osl.iu.edu/ 150 S. Woodlawn Ave. | Bloomington, IN, 474045-7104 | USA Lindley Hall Room 135 | +01 (812) 855-3608 From htor at [hidden] Fri Feb 29 09:02:45 2008 From: htor at [hidden] (Torsten Hoefler) Date: Fri, 29 Feb 2008 10:02:45 -0500 Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09 In-Reply-To: Message-ID: <20080229150245.GK16623@benten.cs.indiana.edu> Bronis, for the record: I do *not* advocate to get rid of datatypes! I think datatypes are a great thing for some parallel applications and they certainly should be used as a high-level abstraction. I've implemented scatter/gather list-based optimizations for modern NICs (IB). But on the other hand, there are many codes out there that do just not use datatypes. Codes that are only supposed to run in heterogeneous environments. Codes that use sockets instead of MPI. If we want to aim at this market, we need to simplify here. A simplification could be to use MPI_BYTE by default ;) but it would be better to get rid of the code and control-path overhead. Just to clarify my opinion, Torsten -- bash$ :(){ :|:&};: --------------------- http://www.unixer.de/ ----- Indiana University | http://www.indiana.edu Open Systems Lab | http://osl.iu.edu/ 150 S. Woodlawn Ave. | Bloomington, IN, 474045-7104 | USA Lindley Hall Room 135 | +01 (812) 855-3608 From rlgraham at [hidden] Fri Feb 29 09:08:22 2008 From: rlgraham at [hidden] (Richard Graham) Date: Fri, 29 Feb 2008 10:08:22 -0500 Subject: [Mpi3-subsetting] Where is archive? In-Reply-To: Message-ID: The mailing lists at uiuc are no longer active, and at this stage just forward mail to lists.mpi-forum.org . This too will be turned off in about 2 weeks. Each working group has wiki space for such things, some use if more than others. This wg just started its work yesterday, so very little has been done, and we are at the stage of trying to define what we mean by subsetting. The wiki pages can be accessed from the meetings web page, meetings.mpi-forum.org, by following the MPI 3.0 link, and then going to what ever working group you are interested in. I have not looked at the subsetting wiki site, to see if anything has been put up on it yet. Rich On 2/29/08 9:27 AM, "Richard Treumann" wrote: > FYI - the mailing list web page: http://lists.cs.uiuc.edu/mailman/listinfo has > links to most or all of the email lists I know of except this one. > > Is there an archive? > > Also - is there an overview proposal somewhere? > > Thanks > > Dick Treumann - MPI Team/TCEM > IBM Systems & Technology Group > Dept 0lva / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601 > Tele (845) 433-7846 Fax (845) 433-8363 > * -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.supalov at [hidden] Fri Feb 29 09:17:36 2008 From: alexander.supalov at [hidden] (Supalov, Alexander) Date: Fri, 29 Feb 2008 15:17:36 -0000 Subject: [Mpi3-subsetting] Where is archive? In-Reply-To: Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A201197473@swsmsx413.ger.corp.intel.com> Hi, Our WG is so young that we have not put anything up yet. There must be quite a few emails in the archive by now, however, including minutes of yesterday's meeting. Summary slides capturing the state of discussion so far will follow next week. Best regards. Alexander ________________________________ From: mpi3-subsetting-bounces_at_[hidden] [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of Richard Graham Sent: Friday, February 29, 2008 4:08 PM To: Richard Treumann Cc: mpi3-subsetting_at_[hidden] Subject: Re: [Mpi3-subsetting] Where is archive? The mailing lists at uiuc are no longer active, and at this stage just forward mail to lists.mpi-forum.org . This too will be turned off in about 2 weeks. Each working group has wiki space for such things, some use if more than others. This wg just started its work yesterday, so very little has been done, and we are at the stage of trying to define what we mean by subsetting. The wiki pages can be accessed from the meetings web page, meetings.mpi-forum.org, by following the MPI 3.0 link, and then going to what ever working group you are interested in. I have not looked at the subsetting wiki site, to see if anything has been put up on it yet. Rich On 2/29/08 9:27 AM, "Richard Treumann" wrote: FYI - the mailing list web page: http://lists.cs.uiuc.edu/mailman/listinfo has links to most or all of the email lists I know of except this one. Is there an archive? Also - is there an overview proposal somewhere? Thanks Dick Treumann - MPI Team/TCEM IBM Systems & Technology Group Dept 0lva / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601 Tele (845) 433-7846 Fax (845) 433-8363 --------------------------------------------------------------------- Intel GmbH Dornacher Strasse 1 85622 Feldkirchen/Muenchen Germany Sitz der Gesellschaft: Feldkirchen bei Muenchen Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer Registergericht: Muenchen HRB 47456 Ust.-IdNr. VAT Registration No.: DE129385895 Citibank Frankfurt (BLZ 502 109 00) 600119052 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. * -------------- next part -------------- An HTML attachment was scrubbed... URL: From rlgraham at [hidden] Fri Feb 29 09:19:56 2008 From: rlgraham at [hidden] (Richard Graham) Date: Fri, 29 Feb 2008 10:19:56 -0500 Subject: [Mpi3-subsetting] Some "stupid user" questions, comments. In-Reply-To: <5ECAB1304A8B5B4CB3F9D6C01E4E21A201197401@swsmsx413.ger.corp.intel.com> Message-ID: On 2/29/08 9:26 AM, "Supalov, Alexander" wrote: > Dear RIchard, > > Thanks. The more complicated the standard gets, the happier are the > implementors. However, now we try to think like MPI users for a change, so, > thanks for providing a reality check. > >>> >> Quite to the contrary. The simpler the standard is the easier to support >>> ­ complexity is not a good thing at all. >>> >> This is my view as an implementer. Complexity is often introduced when >>> trying to get good performance out of >>> >> a spec that supports a wide variety of options. > > Now, to one of your questions. An MPI_ANY_SOURCE MPI_Recv in multifabric > environment means that a receive has to be posted somehow to more than one > fabric in the MPI device layer. Once one of them gets the message, the posted > receives should be cancelled on other fabrics. Now, what if they've already > matched and started to receive something? What if they cannot cancel a posted > receive? And so on. There are 3 to 5 ways to deal with this situation, with > and without actually posting a receive, but none of them is good enough if you > ask me. That's why there are 3 to 5 of them, actually. And all of them > complicate the progress engine - the heart of an MPI implementation - at > exactly the spot where one wants things simple and fast. > >>> >> The any_source and multiple fabrics are two distinct issues. Even if you >>> do not support any_source and have >>> >> multiple fabrics, you have the issue that to support mpi ordering >>> semantics, matching needs to be done >>> >> in the context of all the nics ­ unless you decide to have only one nic >>> do the matching, including any on-host >>> >> traffic. What any_source forces is matching on the receive side ­ unless >>> one wants to set up a very complex >>> >> and inefficient way to make sure that only one receive is matched for >>> each wild card receive. > > Rich > > This means that most of the time we fight these repercussions and curse the > MPI_ANY_SOURCE. Or, looping back to the beginning of this message, we actually > never stop blessing MPI_ANY_SOURCE. Fighting this kind of trouble is what we > are paid for. ;) > > Best regards. > > Alexander > > > From: mpi3-subsetting-bounces_at_[hidden] > [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of Richard > Barrett > Sent: Friday, February 29, 2008 2:50 PM > To: mpi3-subsetting_at_[hidden] > Subject: [Mpi3-subsetting] Some "stupid user" questions, comments. > > Hi folks, > > I'm still sorting things out in my mind, so perhaps this note is just me > talking to myself. But should you feel so compelled to sort through it, I > would appreciate any feedback you might offer; and it will make me a more > informed participant. > > I see two main perspectives: the user and the implementer. I come from the > user side, so I feel comfortable in positing that user confusion over the size > of the standard is really a function of presentation. That is, most of us get > our information regarding using MPI directly from the standard. For me, this > is the _only_ standard I've ever actually read! Perhaps I am missing out on > thousands of C and Fortran capabilities, but sometimes ignorance is bliss. > That speaks highly to the MPI specification presentation; however it need not > be the case. An easy solution to the "too many routines" complaint is a > tutorial/book/chapter on the basics, with pointers to further information. And > in fact these books exist. That said, I hope that MPI-3 deprecates a > meaningful volume of functionality. > >> >From the implementer perspective, there appear to be two goals. First is to >> ease the burden with regard to the amount of functionality that must be >> supported. (And we users don't want to hear of your whining, esp. from a >> company the size of Intel :) Second, which overlaps with user concerns, is >> performance. That is, by defining a small subset of functionality, strong >> performance (in some sense, e.g. speed or memory requirements) can be >> realized. > > At the risk of starting too detailed a discussion at this early point (as well > as exposing my ignorance:), I will throw out a few situations for discussion. > > 1. What would such a subset would imply with regard to what I view as support > functionality, such as user-defined datatypes, topologies, etc? Ie could this > support be easily provided, say by cutting-and-pasting from the full > implementation you will still provide? (I now see Torsten recommends > excluding datatypes, but what of other stuff?) > 2. Even more broadly (and perhaps very ignorantly), can I simply link in both > libraries, like -lmpi_subset -lmpi, getting the good stuff from the former and > the excluded functionality from the latter? In addition to the application > developers use of MPI, all large application programs I¹ve dealt with make > some use of externally produced libraries (a ³very good thing² imo), which > probably exceed the functionality in a ³subset² implementation. > 3. I (basically) understand the adverse performance effects of allowing > promiscuous receives (MPI_ANY_SOURCE). However, this is a powerful capability > for many codes, and used only in moderation, eg for setting up communication > requirements (such as communication partners in unstructured, semi-structured, > and dynamic mesh computations). In this case the sender knows its partner, but > the receiver does not. A reduction(sum) is used to let each process know the > number of communication partners from which it will receive data, the process > posts that many promiscuous receives, which when satisfied lets it from then > on specify the sender. So would it be possible to include this capability in a > separate function, say the blocking send/recv, but not allow it in the > non-blocking version? > 4. Collectives: I can't name a code I've ever worked with that doesn't > require MPI_Allreduce (though I wouldn¹t be surprised to hear of many), and > this in a broad set of science areas. MPI_Bcast is also often used (but quite > often only in the setup phase). I see MPI_Reduce used most often to collect > timing information, so MPI_Allreduce would probably be fine as well. > MPI_Gather is often quite useful, as is MPI_Scatter, but again often in > setup. (Though often ³setup² occurs once per time step.) Non-constant size > versions are often used. And others can also no doubt offer strong opinions > regarding inclusion of exclusion. But from an implementation perspective, > what are the issues? In particular, is the basic infrastructure for these > (and other collective operations) the same? A driving premise for supporting > collectives is that the sort of performance driven capability under > discussion is most needed by applications running at very large scale, which > is where even very good collect implementations run into problems. > 5. Language bindings and perhaps other things: With the expectation/hope that > full implementations continue to be available, I could use them for code > development, thus making use of things like type checking, etc. And does this > latter use then imply the need for "stubs" for things like the (vaporous) > Fortran bindings module, communicators (if only MPI_COMM_WORLD is supported), > etc.? And presuming the answer to #2 is ³no², could/should the full > implementation ³warn² me (preferably at compile time) when I¹m using > functionality that rules out use of the subset? > 6. Will the profile layer still be supported? Generating usage can still be > quantified using a full implementation, but performance would not be (at > least in this manner), which would rule out an apples-to-apples comparison > between a full implementation and the subset version with its advertised > superior performance. (Of course an overall runtime could be compared, which > is the final word, but a more detailed analysis is often preferred.) > 7. If blocking and non-blocking are required of the subset, aren't these > blocking semantics? > > MPI_Send: MPI_Isend ( ..., &req ); MPI_Wait ( ..., &req ); > ----- > MPI_Recv: MPI_Irecv ( ..., &req ); MPI_Wait ( &req ); > > - And speaking of this, are there performance issues associated with > variants of MPI_Wait, eg MPI_Waitany, MPI_Waitsome? > > Finally, I¹ll officially register my concern with what I see as an increasing > complexity in this effort, esp wrt ³multiple subsets². I don¹t intend this > comment to suppress ideas, but to help keep the beating the drum for > simplicity, which I see as a key goal of this effort. > > If you read this far, thanks! My apologies if some of these issues have been > previously covered. And if I've simply exposed myself as ignorant, I feel > confident is stating that I am not alone - these questions will persist from > others. :) > > Richard * -------------- next part -------------- An HTML attachment was scrubbed... URL: From rlgraham at [hidden] Fri Feb 29 09:32:49 2008 From: rlgraham at [hidden] (Richard Graham) Date: Fri, 29 Feb 2008 10:32:49 -0500 Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09 In-Reply-To: <20080229150245.GK16623@benten.cs.indiana.edu> Message-ID: Getting rid of the data types is not an option, in my opinion. I would be ok if we decided on a subset that includes something that includes basic data types and some sort of regular patterns based on these - which I believe represents a very large fraction of the application uses. I am NOT advocating going away from the general support we have for data types in MPI, just providing a way for implementers to know that under some use case scenarios (which I think are by far the common case) simpler and more efficient data type support can be provided. This also allows for implementations, if they choose to take advantage of h/w gather/scatter capabilities. At this stage the notion of subsetting is just that - a notion - and I don't think that as a group we have thought through all the implications. Rich On 2/29/08 10:02 AM, "Torsten Hoefler" wrote: > Bronis, > for the record: I do *not* advocate to get rid of datatypes! I think > datatypes are a great thing for some parallel applications and they > certainly should be used as a high-level abstraction. I've implemented > scatter/gather list-based optimizations for modern NICs (IB). > > But on the other hand, there are many codes out there that do just not > use datatypes. Codes that are only supposed to run in heterogeneous > environments. Codes that use sockets instead of MPI. If we want to aim > at this market, we need to simplify here. A simplification could be to > use MPI_BYTE by default ;) but it would be better to get rid of the code > and control-path overhead. > > Just to clarify my opinion, > Torsten From alexander.supalov at [hidden] Fri Feb 29 09:45:47 2008 From: alexander.supalov at [hidden] (Supalov, Alexander) Date: Fri, 29 Feb 2008 15:45:47 -0000 Subject: [Mpi3-subsetting] Some "stupid user" questions, comments. In-Reply-To: Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A2011974B8@swsmsx413.ger.corp.intel.com> Thanks. You are right - if there's more than one route between two processes, there's a matching issue, too. As for my special implementor's point of view, I was kidding. ________________________________ From: mpi3-subsetting-bounces_at_[hidden] [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of Richard Graham Sent: Friday, February 29, 2008 4:20 PM To: mpi3-subsetting_at_[hidden] Subject: Re: [Mpi3-subsetting] Some "stupid user" questions, comments. On 2/29/08 9:26 AM, "Supalov, Alexander" wrote: Dear RIchard, Thanks. The more complicated the standard gets, the happier are the implementors. However, now we try to think like MPI users for a change, so, thanks for providing a reality check. >> Quite to the contrary. The simpler the standard is the easier to support - complexity is not a good thing at all. >> This is my view as an implementer. Complexity is often introduced when trying to get good performance out of >> a spec that supports a wide variety of options. Now, to one of your questions. An MPI_ANY_SOURCE MPI_Recv in multifabric environment means that a receive has to be posted somehow to more than one fabric in the MPI device layer. Once one of them gets the message, the posted receives should be cancelled on other fabrics. Now, what if they've already matched and started to receive something? What if they cannot cancel a posted receive? And so on. There are 3 to 5 ways to deal with this situation, with and without actually posting a receive, but none of them is good enough if you ask me. That's why there are 3 to 5 of them, actually. And all of them complicate the progress engine - the heart of an MPI implementation - at exactly the spot where one wants things simple and fast. >> The any_source and multiple fabrics are two distinct issues. Even if you do not support any_source and have >> multiple fabrics, you have the issue that to support mpi ordering semantics, matching needs to be done >> in the context of all the nics - unless you decide to have only one nic do the matching, including any on-host >> traffic. What any_source forces is matching on the receive side - unless one wants to set up a very complex >> and inefficient way to make sure that only one receive is matched for each wild card receive. Rich This means that most of the time we fight these repercussions and curse the MPI_ANY_SOURCE. Or, looping back to the beginning of this message, we actually never stop blessing MPI_ANY_SOURCE. Fighting this kind of trouble is what we are paid for. ;) Best regards. Alexander ________________________________ From: mpi3-subsetting-bounces_at_[hidden] [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of Richard Barrett Sent: Friday, February 29, 2008 2:50 PM To: mpi3-subsetting_at_[hidden] Subject: [Mpi3-subsetting] Some "stupid user" questions, comments. Hi folks, I'm still sorting things out in my mind, so perhaps this note is just me talking to myself. But should you feel so compelled to sort through it, I would appreciate any feedback you might offer; and it will make me a more informed participant. I see two main perspectives: the user and the implementer. I come from the user side, so I feel comfortable in positing that user confusion over the size of the standard is really a function of presentation. That is, most of us get our information regarding using MPI directly from the standard. For me, this is the _only_ standard I've ever actually read! Perhaps I am missing out on thousands of C and Fortran capabilities, but sometimes ignorance is bliss. That speaks highly to the MPI specification presentation; however it need not be the case. An easy solution to the "too many routines" complaint is a tutorial/book/chapter on the basics, with pointers to further information. And in fact these books exist. That said, I hope that MPI-3 deprecates a meaningful volume of functionality. >From the implementer perspective, there appear to be two goals. First is to ease the burden with regard to the amount of functionality that must be supported. (And we users don't want to hear of your whining, esp. from a company the size of Intel :) Second, which overlaps with user concerns, is performance. That is, by defining a small subset of functionality, strong performance (in some sense, e.g. speed or memory requirements) can be realized. At the risk of starting too detailed a discussion at this early point (as well as exposing my ignorance:), I will throw out a few situations for discussion. 1. What would such a subset would imply with regard to what I view as support functionality, such as user-defined datatypes, topologies, etc? Ie could this support be easily provided, say by cutting-and-pasting from the full implementation you will still provide? (I now see Torsten recommends excluding datatypes, but what of other stuff?) 2. Even more broadly (and perhaps very ignorantly), can I simply link in both libraries, like -lmpi_subset -lmpi, getting the good stuff from the former and the excluded functionality from the latter? In addition to the application developers use of MPI, all large application programs I've dealt with make some use of externally produced libraries (a "very good thing" imo), which probably exceed the functionality in a "subset" implementation. 3. I (basically) understand the adverse performance effects of allowing promiscuous receives (MPI_ANY_SOURCE). However, this is a powerful capability for many codes, and used only in moderation, eg for setting up communication requirements (such as communication partners in unstructured, semi-structured, and dynamic mesh computations). In this case the sender knows its partner, but the receiver does not. A reduction(sum) is used to let each process know the number of communication partners from which it will receive data, the process posts that many promiscuous receives, which when satisfied lets it from then on specify the sender. So would it be possible to include this capability in a separate function, say the blocking send/recv, but not allow it in the non-blocking version? 4. Collectives: I can't name a code I've ever worked with that doesn't require MPI_Allreduce (though I wouldn't be surprised to hear of many), and this in a broad set of science areas. MPI_Bcast is also often used (but quite often only in the setup phase). I see MPI_Reduce used most often to collect timing information, so MPI_Allreduce would probably be fine as well. MPI_Gather is often quite useful, as is MPI_Scatter, but again often in setup. (Though often "setup" occurs once per time step.) Non-constant size versions are often used. And others can also no doubt offer strong opinions regarding inclusion of exclusion. But from an implementation perspective, what are the issues? In particular, is the basic infrastructure for these (and other collective operations) the same? A driving premise for supporting collectives is that the sort of performance driven capability under discussion is most needed by applications running at very large scale, which is where even very good collect implementations run into problems. 5. Language bindings and perhaps other things: With the expectation/hope that full implementations continue to be available, I could use them for code development, thus making use of things like type checking, etc. And does this latter use then imply the need for "stubs" for things like the (vaporous) Fortran bindings module, communicators (if only MPI_COMM_WORLD is supported), etc.? And presuming the answer to #2 is "no", could/should the full implementation "warn" me (preferably at compile time) when I'm using functionality that rules out use of the subset? 6. Will the profile layer still be supported? Generating usage can still be quantified using a full implementation, but performance would not be (at least in this manner), which would rule out an apples-to-apples comparison between a full implementation and the subset version with its advertised superior performance. (Of course an overall runtime could be compared, which is the final word, but a more detailed analysis is often preferred.) 7. If blocking and non-blocking are required of the subset, aren't these blocking semantics? MPI_Send: MPI_Isend ( ..., &req ); MPI_Wait ( ..., &req ); ----- MPI_Recv: MPI_Irecv ( ..., &req ); MPI_Wait ( &req ); - And speaking of this, are there performance issues associated with variants of MPI_Wait, eg MPI_Waitany, MPI_Waitsome? Finally, I'll officially register my concern with what I see as an increasing complexity in this effort, esp wrt "multiple subsets". I don't intend this comment to suppress ideas, but to help keep the beating the drum for simplicity, which I see as a key goal of this effort. If you read this far, thanks! My apologies if some of these issues have been previously covered. And if I've simply exposed myself as ignorant, I feel confident is stating that I am not alone - these questions will persist from others. :) Richard --------------------------------------------------------------------- Intel GmbH Dornacher Strasse 1 85622 Feldkirchen/Muenchen Germany Sitz der Gesellschaft: Feldkirchen bei Muenchen Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer Registergericht: Muenchen HRB 47456 Ust.-IdNr. VAT Registration No.: DE129385895 Citibank Frankfurt (BLZ 502 109 00) 600119052 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. * -------------- next part -------------- An HTML attachment was scrubbed... URL: From rbarrett at [hidden] Fri Feb 29 10:01:34 2008 From: rbarrett at [hidden] (Richard Barrett) Date: Fri, 29 Feb 2008 11:01:34 -0500 Subject: [Mpi3-subsetting] MPI_ANY_SOURCE In-Reply-To: Message-ID: >> Now, to one of your questions. An MPI_ANY_SOURCE Although I appreciate the discussion, my intent (uh-oh!) in bring this up to let you know I "accept" the problem, yet ask for the capability anyway, but in a manner that keeps it from presenting problems everywhere. Or maybe I'm under-estimating what I was once told: the use of MPI_ANY_SOURCE anywhere means it is a problem everywhere, ie in _every_ function involved in transmitting data? If that is the case, but I still _really_ wanted to use -lmpi_subset, I could do this: suppose a pe knows it will receive data from m pes. It could post numpe non-blocking receives, complete m, discover who they're from, then cancel the rest. Now I'm thinking I've created a bigger problem: when running acros numpes=100k cores, but m is say 10. True? Barring some sort of workaround, excluding codes that "need" MPI_ANY_SOURCE seems to meaningfully reduce the number of codes that could use -lmpi_subset. > 3. I (basically) understand the adverse performance effects of allowing promiscuous receives (MPI_ANY_SOURCE). However, this is a powerful capability for many codes, and used only in moderation, eg for setting up communication requirements (such as communication partners in unstructured, semi-structured, and dynamic mesh computations). In this case the sender knows its partner, but the receiver does not. A reduction(sum) is used to let each process know the number of communication partners from which it will receive data, the process posts that many promiscuous receives, which when satisfied lets it from then on specify the sender. So would it be possible to include this capability in a separate function, say the blocking send/recv, but not allow it in the non-blocking version? Richard -- Richard Barrett Future Technologies Group, Computer Science and Mathematics Division, and Scientific Computing Group, National Center for Computational Science Oak Ridge National Laboratory http://ft.ornl.gov/~rbarrett From alexander.supalov at [hidden] Fri Feb 29 10:30:25 2008 From: alexander.supalov at [hidden] (Supalov, Alexander) Date: Fri, 29 Feb 2008 16:30:25 -0000 Subject: [Mpi3-subsetting] MPI_ANY_SOURCE In-Reply-To: Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A201197514@swsmsx413.ger.corp.intel.com> I see. Sorry for explaining the obvious. I guess the progress engine may take a hit every time there are either an MPI_ANY_SOURCE Recv or (thanks to Rich) multiple paths between the processes. Hence, all transfers are potentially affected. Cancellation is a sticky matter. Some fabrics won't let you do this, so a cancel will always misfire. -----Original Message----- From: mpi3-subsetting-bounces_at_[hidden] [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of Richard Barrett Sent: Friday, February 29, 2008 5:02 PM To: mpi3-subsetting_at_[hidden] Subject: [Mpi3-subsetting] MPI_ANY_SOURCE >> Now, to one of your questions. An MPI_ANY_SOURCE Although I appreciate the discussion, my intent (uh-oh!) in bring this up to let you know I "accept" the problem, yet ask for the capability anyway, but in a manner that keeps it from presenting problems everywhere. Or maybe I'm under-estimating what I was once told: the use of MPI_ANY_SOURCE anywhere means it is a problem everywhere, ie in _every_ function involved in transmitting data? If that is the case, but I still _really_ wanted to use -lmpi_subset, I could do this: suppose a pe knows it will receive data from m pes. It could post numpe non-blocking receives, complete m, discover who they're from, then cancel the rest. Now I'm thinking I've created a bigger problem: when running acros numpes=100k cores, but m is say 10. True? Barring some sort of workaround, excluding codes that "need" MPI_ANY_SOURCE seems to meaningfully reduce the number of codes that could use -lmpi_subset. > 3. I (basically) understand the adverse performance effects of allowing promiscuous receives (MPI_ANY_SOURCE). However, this is a powerful capability for many codes, and used only in moderation, eg for setting up communication requirements (such as communication partners in unstructured, semi-structured, and dynamic mesh computations). In this case the sender knows its partner, but the receiver does not. A reduction(sum) is used to let each process know the number of communication partners from which it will receive data, the process posts that many promiscuous receives, which when satisfied lets it from then on specify the sender. So would it be possible to include this capability in a separate function, say the blocking send/recv, but not allow it in the non-blocking version? Richard -- Richard Barrett Future Technologies Group, Computer Science and Mathematics Division, and Scientific Computing Group, National Center for Computational Science Oak Ridge National Laboratory http://ft.ornl.gov/~rbarrett _______________________________________________ Mpi3-subsetting mailing list Mpi3-subsetting_at_[hidden] http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting --------------------------------------------------------------------- Intel GmbH Dornacher Strasse 1 85622 Feldkirchen/Muenchen Germany Sitz der Gesellschaft: Feldkirchen bei Muenchen Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer Registergericht: Muenchen HRB 47456 Ust.-IdNr. VAT Registration No.: DE129385895 Citibank Frankfurt (BLZ 502 109 00) 600119052 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. From rlgraham at [hidden] Fri Feb 29 10:59:27 2008 From: rlgraham at [hidden] (Richard Graham) Date: Fri, 29 Feb 2008 11:59:27 -0500 Subject: [Mpi3-subsetting] MPI_ANY_SOURCE In-Reply-To: <5ECAB1304A8B5B4CB3F9D6C01E4E21A201197514@swsmsx413.ger.corp.intel.com> Message-ID: On 2/29/08 11:30 AM, "Supalov, Alexander" wrote: > I see. Sorry for explaining the obvious. I guess the progress engine may > take a hit every time there are either an MPI_ANY_SOURCE Recv or (thanks > to Rich) multiple paths between the processes. Hence, all transfers are > potentially affected. With any_source the message can come from anyone, so the cost really depends on the mpi's queuing strategy, so the actual cost is very implementation specific. What ever the cost is, there are more potential sources, so at 100k there are 100k potential sources. The queuing could always have the unexpected messages cached in a single queue, but then all matching would be more expensive, vs. more of a hierarchical queue structure .... For expected messages there can also be an increase in matching costs, but again this is implementation specific. The other cost is that matching really has to be done at the destination - just a practical need - try to cancel 100k posted receives, after one match has been made, and make sure that only one proc has done the match. > > Cancellation is a sticky matter. Some fabrics won't let you do this, so > a cancel will always misfire. Is this the case on the receive side ? The cancellation that Richard is mentioning is a receive side cancellation. I don't remember a network with this limitation, but I could very well be wrong on this one - I suppose it can also depend on how you do the matching. Rich > > -----Original Message----- > From: mpi3-subsetting-bounces_at_[hidden] > [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of > Richard Barrett > Sent: Friday, February 29, 2008 5:02 PM > To: mpi3-subsetting_at_[hidden] > Subject: [Mpi3-subsetting] MPI_ANY_SOURCE > > > >>> Now, to one of your questions. An MPI_ANY_SOURCE > > Although I appreciate the discussion, my intent (uh-oh!) in bring this > up to > let you know I "accept" the problem, yet ask for the capability anyway, > but > in a manner that keeps it from presenting problems everywhere. Or maybe > I'm > under-estimating what I was once told: the use of MPI_ANY_SOURCE > anywhere > means it is a problem everywhere, ie in _every_ function involved in > transmitting data? > > If that is the case, but I still _really_ wanted to use -lmpi_subset, I > could do this: suppose a pe knows it will receive data from m pes. It > could > post numpe non-blocking receives, complete m, discover who they're from, > then cancel the rest. Now I'm thinking I've created a bigger problem: > when > running acros numpes=100k cores, but m is say 10. True? > > Barring some sort of workaround, excluding codes that "need" > MPI_ANY_SOURCE > seems to meaningfully reduce the number of codes that could use > -lmpi_subset. > >> 3. I (basically) understand the adverse performance effects of > allowing > promiscuous receives (MPI_ANY_SOURCE). However, this is a powerful > capability > for many codes, and used only in moderation, eg for setting up > communication > requirements (such as communication partners in unstructured, > semi-structured, > and dynamic mesh computations). In this case the sender knows its > partner, but > the receiver does not. A reduction(sum) is used to let each process know > the > number of communication partners from which it will receive data, the > process > posts that many promiscuous receives, which when satisfied lets it from > then on > specify the sender. So would it be possible to include this capability > in a > separate function, say the blocking send/recv, but not allow it in the > non-blocking version? > > Richard From alexander.supalov at [hidden] Fri Feb 29 11:23:46 2008 From: alexander.supalov at [hidden] (Supalov, Alexander) Date: Fri, 29 Feb 2008 17:23:46 -0000 Subject: [Mpi3-subsetting] MPI_ANY_SOURCE In-Reply-To: Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A20119756A@swsmsx413.ger.corp.intel.com> I heard Myricom MX would not allow recv cancellation. This needs to be checked. -----Original Message----- From: mpi3-subsetting-bounces_at_[hidden] [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of Richard Graham Sent: Friday, February 29, 2008 5:59 PM To: mpi3-subsetting_at_[hidden] Subject: Re: [Mpi3-subsetting] MPI_ANY_SOURCE On 2/29/08 11:30 AM, "Supalov, Alexander" wrote: > I see. Sorry for explaining the obvious. I guess the progress engine may > take a hit every time there are either an MPI_ANY_SOURCE Recv or (thanks > to Rich) multiple paths between the processes. Hence, all transfers are > potentially affected. With any_source the message can come from anyone, so the cost really depends on the mpi's queuing strategy, so the actual cost is very implementation specific. What ever the cost is, there are more potential sources, so at 100k there are 100k potential sources. The queuing could always have the unexpected messages cached in a single queue, but then all matching would be more expensive, vs. more of a hierarchical queue structure .... For expected messages there can also be an increase in matching costs, but again this is implementation specific. The other cost is that matching really has to be done at the destination - just a practical need - try to cancel 100k posted receives, after one match has been made, and make sure that only one proc has done the match. > > Cancellation is a sticky matter. Some fabrics won't let you do this, so > a cancel will always misfire. Is this the case on the receive side ? The cancellation that Richard is mentioning is a receive side cancellation. I don't remember a network with this limitation, but I could very well be wrong on this one - I suppose it can also depend on how you do the matching. Rich > > -----Original Message----- > From: mpi3-subsetting-bounces_at_[hidden] > [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of > Richard Barrett > Sent: Friday, February 29, 2008 5:02 PM > To: mpi3-subsetting_at_[hidden] > Subject: [Mpi3-subsetting] MPI_ANY_SOURCE > > > >>> Now, to one of your questions. An MPI_ANY_SOURCE > > Although I appreciate the discussion, my intent (uh-oh!) in bring this > up to > let you know I "accept" the problem, yet ask for the capability anyway, > but > in a manner that keeps it from presenting problems everywhere. Or maybe > I'm > under-estimating what I was once told: the use of MPI_ANY_SOURCE > anywhere > means it is a problem everywhere, ie in _every_ function involved in > transmitting data? > > If that is the case, but I still _really_ wanted to use -lmpi_subset, I > could do this: suppose a pe knows it will receive data from m pes. It > could > post numpe non-blocking receives, complete m, discover who they're from, > then cancel the rest. Now I'm thinking I've created a bigger problem: > when > running acros numpes=100k cores, but m is say 10. True? > > Barring some sort of workaround, excluding codes that "need" > MPI_ANY_SOURCE > seems to meaningfully reduce the number of codes that could use > -lmpi_subset. > >> 3. I (basically) understand the adverse performance effects of > allowing > promiscuous receives (MPI_ANY_SOURCE). However, this is a powerful > capability > for many codes, and used only in moderation, eg for setting up > communication > requirements (such as communication partners in unstructured, > semi-structured, > and dynamic mesh computations). In this case the sender knows its > partner, but > the receiver does not. A reduction(sum) is used to let each process know > the > number of communication partners from which it will receive data, the > process > posts that many promiscuous receives, which when satisfied lets it from > then on > specify the sender. So would it be possible to include this capability > in a > separate function, say the blocking send/recv, but not allow it in the > non-blocking version? > > Richard _______________________________________________ Mpi3-subsetting mailing list Mpi3-subsetting_at_[hidden] http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting --------------------------------------------------------------------- Intel GmbH Dornacher Strasse 1 85622 Feldkirchen/Muenchen Germany Sitz der Gesellschaft: Feldkirchen bei Muenchen Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer Registergericht: Muenchen HRB 47456 Ust.-IdNr. VAT Registration No.: DE129385895 Citibank Frankfurt (BLZ 502 109 00) 600119052 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. From jsquyres at [hidden] Fri Feb 29 14:28:53 2008 From: jsquyres at [hidden] (Jeff Squyres) Date: Fri, 29 Feb 2008 15:28:53 -0500 Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09 In-Reply-To: <5ECAB1304A8B5B4CB3F9D6C01E4E21A201196FE1@swsmsx413.ger.corp.intel.com> Message-ID: <89169D89-5686-4227-B983-267060C9C3ED@cisco.com> On Feb 28, 2008, at 11:29 PM, Supalov, Alexander wrote: > - Communicator & group management: better memory footprint. Take this point to an extreme - it may be possible to say "this app only uses MPI_COMM_WORLD". In this case, you can remove the communicator field from network packets for a small gain in latency, or perhaps reduce it from 4 to 2 bytes or 1 byte (e.g., if an app says "I'll only use 4 communicators"). > - Message tagging: better support for stable dataflow exchanges, > smaller > packets. Two points here: - allow app to eliminate MPI_ANY_TAG - just like with communicators, allow the app to say "I'll only use N tags", where N can save you space in network packets (e.g., if N==1, no need for tag on the wire; if N == 2, then you only need 1 byte for the tag, etc.). > - Non-blocking communication: easier ordering, simplified request > handling. If there is no non-blocking communication, enormous chunks of the progression engine can be optimized in terms of memory (i.e., remove lots of now-unnecessary code) and probably a little speed. On the teleconf (sorry I missed it), was there discussion of how to specify these hints? Perhaps a new function: MPI_INIT_INFO (pass an MPI_Info handle to MPI_INIT)? Or is it something that needs to be specified at compile/link time? -- Jeff Squyres Cisco Systems From alexander.supalov at [hidden] Fri Feb 29 14:54:18 2008 From: alexander.supalov at [hidden] (Supalov, Alexander) Date: Fri, 29 Feb 2008 20:54:18 -0000 Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09 In-Reply-To: <89169D89-5686-4227-B983-267060C9C3ED@cisco.com> Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A2011975F4@swsmsx413.ger.corp.intel.com> Hi, Thanks. I was thinking privately about MPI_Init_subsets or so that would use an Info object, too. I bet a comparable idea - or at least desire to keep the number of subset related calls under strict control - was aired at the telecon by someone, but we didn't go into much detail then. One reservation I have about Info objects is that they are so flexible as to be dangerous. They promoting lots of optional, loosely controlled features that can effectively blur the interface definition. On the other hand, I don't see any viable alternative to that, at least if the number of subsets is going to be substantial and ever growing. Of course, it's a little too early to fix any implementation details I'm afraid. Anyway, let's keep this idea in mind while we're settling the scope. Best regards. Alexander -----Original Message----- From: mpi3-subsetting-bounces_at_[hidden] [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of Jeff Squyres Sent: Friday, February 29, 2008 9:29 PM To: mpi3-subsetting_at_[hidden] Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09 On Feb 28, 2008, at 11:29 PM, Supalov, Alexander wrote: > - Communicator & group management: better memory footprint. Take this point to an extreme - it may be possible to say "this app only uses MPI_COMM_WORLD". In this case, you can remove the communicator field from network packets for a small gain in latency, or perhaps reduce it from 4 to 2 bytes or 1 byte (e.g., if an app says "I'll only use 4 communicators"). > - Message tagging: better support for stable dataflow exchanges, > smaller > packets. Two points here: - allow app to eliminate MPI_ANY_TAG - just like with communicators, allow the app to say "I'll only use N tags", where N can save you space in network packets (e.g., if N==1, no need for tag on the wire; if N == 2, then you only need 1 byte for the tag, etc.). > - Non-blocking communication: easier ordering, simplified request > handling. If there is no non-blocking communication, enormous chunks of the progression engine can be optimized in terms of memory (i.e., remove lots of now-unnecessary code) and probably a little speed. On the teleconf (sorry I missed it), was there discussion of how to specify these hints? Perhaps a new function: MPI_INIT_INFO (pass an MPI_Info handle to MPI_INIT)? Or is it something that needs to be specified at compile/link time? -- Jeff Squyres Cisco Systems _______________________________________________ Mpi3-subsetting mailing list Mpi3-subsetting_at_[hidden] http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting --------------------------------------------------------------------- Intel GmbH Dornacher Strasse 1 85622 Feldkirchen/Muenchen Germany Sitz der Gesellschaft: Feldkirchen bei Muenchen Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer Registergericht: Muenchen HRB 47456 Ust.-IdNr. VAT Registration No.: DE129385895 Citibank Frankfurt (BLZ 502 109 00) 600119052 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. From jsquyres at [hidden] Fri Feb 29 15:28:41 2008 From: jsquyres at [hidden] (Jeff Squyres) Date: Fri, 29 Feb 2008 16:28:41 -0500 Subject: [Mpi3-subsetting] agenda for subsetting kickoff telecon ww09 In-Reply-To: <5ECAB1304A8B5B4CB3F9D6C01E4E21A2011975F4@swsmsx413.ger.corp.intel.com> Message-ID: <11D1F118-5607-4E42-941F-9BE123C0F9B7@cisco.com> One other issue is that we'd have to make [at least some of] the MPI_Info_* functions be able to be called before MPI_INIT. On Feb 29, 2008, at 3:54 PM, Supalov, Alexander wrote: > Hi, > > Thanks. I was thinking privately about MPI_Init_subsets or so that > would > use an Info object, too. I bet a comparable idea - or at least > desire to > keep the number of subset related calls under strict control - was > aired > at the telecon by someone, but we didn't go into much detail then. > > One reservation I have about Info objects is that they are so flexible > as to be dangerous. They promoting lots of optional, loosely > controlled > features that can effectively blur the interface definition. On the > other hand, I don't see any viable alternative to that, at least if > the > number of subsets is going to be substantial and ever growing. > > Of course, it's a little too early to fix any implementation details > I'm > afraid. Anyway, let's keep this idea in mind while we're settling the > scope. > > Best regards. > > Alexander > > -----Original Message----- > From: mpi3-subsetting-bounces_at_[hidden] > [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of Jeff > Squyres > Sent: Friday, February 29, 2008 9:29 PM > To: mpi3-subsetting_at_[hidden] > Subject: Re: [Mpi3-subsetting] agenda for subsetting kickoff telecon > ww09 > > On Feb 28, 2008, at 11:29 PM, Supalov, Alexander wrote: > >> - Communicator & group management: better memory footprint. > > Take this point to an extreme - it may be possible to say "this app > only uses MPI_COMM_WORLD". In this case, you can remove the > communicator field from network packets for a small gain in latency, > or perhaps reduce it from 4 to 2 bytes or 1 byte (e.g., if an app says > "I'll only use 4 communicators"). > >> - Message tagging: better support for stable dataflow exchanges, >> smaller >> packets. > > Two points here: > > - allow app to eliminate MPI_ANY_TAG > - just like with communicators, allow the app to say "I'll only use N > tags", where N can save you space in network packets (e.g., if N==1, > no need for tag on the wire; if N == 2, then you only need 1 byte for > the tag, etc.). > >> - Non-blocking communication: easier ordering, simplified request >> handling. > > > If there is no non-blocking communication, enormous chunks of the > progression engine can be optimized in terms of memory (i.e., remove > lots of now-unnecessary code) and probably a little speed. > > On the teleconf (sorry I missed it), was there discussion of how to > specify these hints? Perhaps a new function: MPI_INIT_INFO (pass an > MPI_Info handle to MPI_INIT)? Or is it something that needs to be > specified at compile/link time? > > -- > Jeff Squyres > Cisco Systems > > _______________________________________________ > Mpi3-subsetting mailing list > Mpi3-subsetting_at_[hidden] > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting > --------------------------------------------------------------------- > Intel GmbH > Dornacher Strasse 1 > 85622 Feldkirchen/Muenchen Germany > Sitz der Gesellschaft: Feldkirchen bei Muenchen > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer > Registergericht: Muenchen HRB 47456 Ust.-IdNr. > VAT Registration No.: DE129385895 > Citibank Frankfurt (BLZ 502 109 00) 600119052 > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > > > _______________________________________________ > Mpi3-subsetting mailing list > Mpi3-subsetting_at_[hidden] > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-subsetting -- Jeff Squyres Cisco Systems