From treumann at [hidden] Mon Apr 7 09:54:00 2008 From: treumann at [hidden] (Richard Treumann) Date: Mon, 7 Apr 2008 10:54:00 -0400 Subject: [Mpi-22] 2.1 cleanup or MPI 2.2? Message-ID: In the description of MPI_COMM_FREE we presently give the following advise to implementors. A reference-count mechanism may be used: the reference count is incremented by each call to \func{MPI\_COMM\_DUP}, and decremented by each call to \func{MPI\_COMM\_FREE}. The object is ultimately deallocated when the count reaches zero. I do not think it can ever be valid to implement MPI_COMM_DUP by simply returning a new handle for an existing communicator object while bumping its reference count because the output communicator must have a different context than the original. Assuming I have not missed something, it seems this advise is nonsense. Is removing this the kind of change that should go on the MPI 2.2 list? I will be surprised if anyone offers a rationale for keeping the advise but I am also not quite comfortable that it fits within the "clean up" rules for MPI 2.1 at this late stage. Thoughts? Dick Dick Treumann - MPI Team/TCEM IBM Systems & Technology Group Dept 0lva / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601 Tele (845) 433-7846 Fax (845) 433-8363 * -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsquyres at [hidden] Mon Apr 7 10:40:33 2008 From: jsquyres at [hidden] (Jeff Squyres) Date: Mon, 7 Apr 2008 08:40:33 -0700 Subject: [Mpi-22] MPI 2.2 comments on 1 April document Message-ID: While reviewing Rolf's April document, I came up with a list of MPI 2.2 issues that I thought I'd bring up: Sections: - Miscellaneous - IO (long itemization) - Language bindings --------------------------------------- Based on 1 April 2008 document Miscellaneous ============= - We need to update the Fortran renferences throughout the document (F90 -> F?03?). IO chapter ========== fh parameter should be IN (not INOUT) - p381.5 - p381.44 - p384.31 - p387.6 - p394.6 - p394.31 - p395.31 - p396.22 - p397.27 - p398.2 - p398.25 - p399.2 - p400.2 - p400.22 - p402.18 - p402.40 - p403.15 - p403:36 - p404.33 - p405.8 - p405.38 - p409.2 - p409.22 - p409.38 - p410.9 - p410.25 - p410.43 - p411.10 - p411.28 - p412.2 - p412.20 - p423.39 - p424.37 C++ bindings functions should be const - p381.14 - p382.4 - p384.39 - p387.24 - p393.19,21 - p393.44,46 - p394.23,25 - p394.48 - p395.2 - p395.25 - p395.48 - p396.36,38 - p397.42,45 - p398.17,19 - p398.39,42 - p399.17 - p400.17 - p400.32 - p401.11 - p401.35 - p402.32,35 - p403.7,9 - p403.29 - p404.3 - p404.48 - p405.2 - p405.23,25 - p405.48 - p408.22 - p408.37,38 - p409.18 - p409.32,33 - p410.4 - p410.19,20 - p410.38 - p411.4,5 - p411.23 - p411.38,39 - p412.15 - p412.30,31 - p424.1 - p424.44 Language bindings chapter ========================= - p441.31-33: Replace entire paragraph with: "Constants Constants are singleton objects and are declared const. The only exception is MPI::BOTTOM, which cannot be const because it can be passed as a receive buffer argument, which is not const." >>> Need to fix various C++ binding methods to be const (e.g., Set_name, Set_errhandler, etc.) >>> Same arguments I've raised for a while: all MPI predefined C++ handles should be const except BOTTOM. Short argument: - have to be able to use MPI::COMM_WORLD for initialization before MPI::Init, so they *are* const because they're initialized before main() - the "const" refers to the C++ handle, not the back-end MPI object - the handle does not change (just like MPI_SEND where "comm" argument is IN and the method is const) -- Jeff Squyres Cisco Systems From treumann at [hidden] Mon Apr 7 11:05:48 2008 From: treumann at [hidden] (Richard Treumann) Date: Mon, 7 Apr 2008 12:05:48 -0400 Subject: [Mpi-22] catalog of issues Message-ID: Is there a catalog of issues that will be considered as part of MPI 2.2? The WIKI has something about send buffer access and about C bindings const correctness but I am not aware of a place where I can see whether some issue has been listed as an MPI 2.2 topic. There are assorted suggestions from MPI 2.1 that were deemed too controversial or complex and got moved to MPI 2.2 but I do not have a list. There are topics like MPI_ALLTOALLX that I assume are to be considered. There are others we have discussed in the past like what it means with MPI_ERRORS_ARE_FATAL if MPI_ALLOC_MEM cannot provide the requested space. (Do we add MPI_TRY_ALLOC_MEM(size, info, baseptr, av_flag and deprecate MPI_ALLOC_MEM? Mem not available would still return MPI_SUCCESS and the app would test av_flag to be sure the memory was actually provided.) There are some that may have not been mentioned before (or maybe they have and I do not recall). For example should there be an MPI_GROUP_DUP? There are more I could think of and many I would not think of off hand. It would be helpful to be able to check somewhere and see if an issue that does cross my mind has already been recognized as an MPI 2.2 topic. Dick Dick Treumann - MPI Team/TCEM IBM Systems & Technology Group Dept 0lva / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601 Tele (845) 433-7846 Fax (845) 433-8363 * -------------- next part -------------- An HTML attachment was scrubbed... URL: From rabenseifner at [hidden] Mon Apr 7 13:18:04 2008 From: rabenseifner at [hidden] (Rolf Rabenseifner) Date: Mon, 07 Apr 2008 20:18:04 +0200 Subject: [Mpi-22] 2.1 cleanup or MPI 2.2? In-Reply-To: Message-ID: Dick, if I'm right, then 23.r is not similar. 23.r OK p183, lines 5-8. This advice to implementors on reference counts for groups should include MPI_COMM_GROUP as a routine that increments the reference count. I've put yours as 29.a in http://www.hlrs.de/mpi/mpi21/doc/MPI-2.1draft-2008-02-23-review.txt 23.r I made an OK, but for your 29.a I would recommend to go to MPI-2.2. Best regards Rolf On Mon, 7 Apr 2008 10:54:00 -0400 Richard Treumann wrote: > > > In the description of MPI_COMM_FREE we presently give the following advise > to implementors. > > A reference-count mechanism may be used: the reference count is > incremented by each call to \func{MPI\_COMM\_DUP}, and decremented by > each call to \func{MPI\_COMM\_FREE}. The object is ultimately > deallocated when the count reaches zero. > > I do not think it can ever be valid to implement MPI_COMM_DUP by simply > returning a new handle for an existing communicator object while bumping > its reference count because the output communicator must have a different > context than the original. Assuming I have not missed something, it seems > this advise is nonsense. > > Is removing this the kind of change that should go on the MPI 2.2 list? I > will be surprised if anyone offers a rationale for keeping the advise but I > am also not quite comfortable that it fits within the "clean up" rules for > MPI 2.1 at this late stage. > > Thoughts? > > Dick > > > > Dick Treumann - MPI Team/TCEM > IBM Systems & Technology Group > Dept 0lva / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601 > Tele (845) 433-7846 Fax (845) 433-8363 Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner_at_[hidden] High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530 University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832 Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner Nobelstr. 19, D-70550 Stuttgart, Germany . (Office: Allmandring 30) From rabenseifner at [hidden] Mon Apr 7 14:36:10 2008 From: rabenseifner at [hidden] (Rolf Rabenseifner) Date: Mon, 07 Apr 2008 21:36:10 +0200 Subject: [Mpi-22] Revisiting C++ Language binding sections in MPI-standard In-Reply-To: Message-ID: Moved to MPI-2.2 mailing list - reasons see below. Jeff and Bronis, yes, I agree that language binding should stay at the end. I put PMPI behind, because Chap.1-13 describe all MPI-routines and Chap.14 (Profiling) translates all into MPI+PMPI. I agree with Jeff, that > If stuff is wrong/outdated in Chapter 2, then it should be fixed. This was mainly job of Ballots 1-4 in MPI-2.1 and will be job of Ballot 5 in MPI-2.2 I do not agree to: > If stuff in Chapter 2 really belongs in a Language Bindings chapter, > then it should be moved. because there are, e.g., 20 lines on C-binding - the only 20 lines! They should not be moved to the end of the standard. I would really like to move this discussion to MPI-2.2 because it transforms a consistent MPI standard into another consistent MPI standard, with one exception: If there are bugs in 2.6.4, then they should be fixed. But those bugs are not caused by the merge because Sect. 2.6.4 and Chap. 13 are all MPI-2 texts. Therefore it would have been a Ballot 4 discussion. And because Ballot 4 is finished, this is a MPI-2.2 discussion. Therefore I changed the mailing-list to MPI-2.2. I also believe, it will be an MPI-2.2 decision whether to keep all C++ binding stuff together in Chap. 13.1 or whether to move all the things into diverse locations in the other chapters. These location are currently not defined, because there is no special wording on C and Fortran. Fortran is mainly mentioned in datatype stuff and the caching callback functions. Best regards Rolf On Mon, 7 Apr 2008 11:22:47 -0700 (PDT) "Bronis R. de Supinski" wrote: > > Jeff: > > I agree with most of your proposal. > > If stuff is wrong/outdated in Chapter 2, then it should be fixed. > > If stuff in Chapter 2 really belongs in a Language Bindings chapter, > then it should be moved. > > However, I see no reason to make the language bindings chapter so > early in the standard. In fact, I agree with Rolf's suggestion > that it would be most appropriate as the last chapter, right > before the appendix that lists the actual bindings. As Rolf > suggested, we should make it chapter 14 and make the profiling > interface chapter 13. Making it chapter 3 does not make sense. > > Bronis > > > > On Mon, 7 Apr 2008, Jeff Squyres wrote: > > > The problem is that the text about language bindings is fairly > > disjoint between chapters 2 and 13. Indeed, chapter 13 is redundant > > and out of order / inconsistent with regards to MPI-1 text in some > > places *because* MPI-2 was a separate document. > > > > What about a slightly different proposal: > > > > 1. Move some of the existing Chapter 13/Language Bindings text into > > the relevant parts in the rest of the 2.1 doc (e.g., move the C++ > > communicators discussion to the Right place in Chapter 5/Groups, > > Contexts, Comms). > > > > 2. Make a new chapter 3: Language Bindings. Put in it: > > - All C/Fortran language bindings text from Chapter 2/Terms&Conv > > - All remaining text from Chapter 13/Language bindings > > > > 3. Remove the [now empty] Chapter 13 > > > > > > On Apr 4, 2008, at 7:48 AM, Rolf Rabenseifner wrote: > > > About Chap. 13, especially C++. > > > > > > I'm proposing (referencec to MPI-2.1 Draft Apr.1, 2008): > > > > > > - The MPI-2 Forum decided to put only small overview stuff into > > > Chap. 2 Terms. > > > (I want to recall, that in MPI-2 the Terms are rewritten for whole > > > MPI, > > > i.e., still valid in MPI-2.1) > > > - The MPI-2 Forum decided to put all deeper information into > > > extra sections of an additionally last chapter on Bindings. > > > - The MPI-2 Forum already decided that normal C++ bindings > > > should be after the Fortran bindings. > > > > > > - Terms, page 18, lines 36-39 clearly expresses, that all constants > > > are > > > given only in MPI_ notation and that C++ names (with MPI::) > > > are given in Annex A. > > > I.e., MPI_COMM_WORLD, MPI_FLOAT, MPI_PROC_NULL, ... should not > > > to be translated everywhere in the chapters. > > > Same for Table 3.2 on page 27. > > > > > > - There are important things were C++ clearly differs from C, > > > e.g. the handling of the Status. > > > I have already added the Status handling, see page 31 lines 23-32. > > > (By the way, this information was missing in Chap. 13.1 and only > > > available in the Annex A.) > > > > > > - I'm not aware, whether there are more such stuff, that is explained > > > for C and Fortran and should be also explained for C++. > > > Do you see an additional stuff like status? > > > > > > - I do not expect that it would be a good idee to move all the ugly > > > Fortran problems (17 pages) to the beginning of thee book into > > > Chap. 2 Terms. > > > I would recommend same rule for C++ (12 pages). > > > Chap.2 terms has only 16 pages - with 2 pages dedicated to Fortran, > > > 1/2 page to C, and 3 pages to C++. > > > > > > Best regards > > > Rolf > > > > > > On Thu, 3 Apr 2008 15:13:10 -0400 > > > Jeff Squyres wrote: > > >> On Apr 3, 2008, at 12:09 PM, Rolf Rabenseifner wrote: > > > ... > > >>> For me, the answer may have implications on how separate or > > >>> integrated additional bindings should be integrated into the > > >>> language independent text of the MPI standard. > > >> > > >> I don't quite understand. All officially-supported language bindings > > >> should be listed consistently in the standard. In MPI-2.1, for > > >> example, that means alongside the language neutral bindings in the > > >> text and in Annex A. > > > > > > > > > ------------- > > > > > > On Thu, 3 Apr 2008 15:27:08 -0400 > > > Jeff Squyres wrote: > > >> What about the C++/Fortran language bindings text? Should the > > >> majority of chapter 13 be merged into Terms and Conventions (and > > >> elsewhere)? > > >> > > >> It's not really a "problem", per se -- but it is a little awkward. > > >> There are sections in chapter 13 that could definitely fit in > > >> existing > > >> text elsewhere. Some of it is redundant, too. > > >> > > >> > > >> > > >> On Apr 3, 2008, at 3:19 PM, George Bosilca wrote: > > >>> Bronis, > > >>> > > >>> If the data-type section get moved into the chapter 3 it make sense > > >>> to merge the leftover of the chapter 11 with chapter 7, as long as > > >>> we choose a right name. "MPI Environmental Management" is not the > > >>> right chapter for "Generalized Requests". But of course these are > > >>> just details. > > >>> > > >>> I'll get in touch with you asap to see how we can coordinate. > > >>> > > >>> Thanks, > > >>> george. > > >>> > > >>> On Apr 3, 2008, at 12:58 PM, Bronis R. de Supinski wrote: > > >>>> > > >>>> Rolf: > > >>>> > > >>>> Re: > > >>>>> my general statements do not answer you initial question: > > >>>> > > >>>> My opinion is that leaving obvious problems unfixed based > > >>>> on an expected future version is a bad idea. However, I > > >>>> don't want to argue over this since I think the best > > >>>> approach is just to remove them now and then we don't > > >>>> have to worry about them. Others have more concerns over > > >>>> the time that they can devote to this (not that I have an > > >>>> abundance) and might want to delay in any event in order > > >>>> to get it right (at least mostly). > > >>>> > > >>>>> If you decide to move parts from Chap.11 to Chap.7, > > >>>>> then you both mus discuss this. You both are responsible > > >>>>> for these chapters. > > >>>>> And you should first convince your reviewers: > > >>>>> - Chap. 7: Rich, Jesper, Steve, Kannan, David, Bill > > >>>>> - Chap.11: Bill and Rainer > > >>>>> My recommendation: > > >>>>> Express clearly which parts should be moved exactly to wich line > > >>>>> (all based on page/line numbers as **printed** in Draft Apr. 1, > > >>>>> 2008). > > >>>> > > >>>> I have discussed moving the datatype decoding stuff > > >>>> with Rich and Bill. I will move those sections as I > > >>>> suggested, with an initial pass for the current review. > > >>>> This works well for Rich since he does not have time > > >>>> to do this for another couple of weeks. I hope to get > > >>>> that done today. > > >>>> > > >>>> For the remainder, I will look over the two chapters > > >>>> (7 & 11) and propose an initial merge strategy. George > > >>>> can react to that; I don't know how long it will take > > >>>> me to get that done... > > >>>> > > >>>> Bronis > > >>>> > > >>>> _______________________________________________ > > >>>> mpi-21 mailing list > > >>>> mpi-21_at_[hidden] > > >>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-21 > > >>> > > >>> _______________________________________________ > > >>> mpi-21 mailing list > > >>> mpi-21_at_[hidden] > > >>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-21 > > >> > > >> > > >> -- > > >> Jeff Squyres > > >> Cisco Systems > > >> > > >> _______________________________________________ > > >> mpi-21 mailing list > > >> mpi-21_at_[hidden] > > >> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-21 > > > > > > > > > > > > Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner_at_[hidden] > > > High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530 > > > University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832 > > > Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner > > > Nobelstr. 19, D-70550 Stuttgart, Germany . (Office: Allmandring 30) > > > _______________________________________________ > > > mpi-21 mailing list > > > mpi-21_at_[hidden] > > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-21 > > > > > > -- > > Jeff Squyres > > Cisco Systems > > > > _______________________________________________ > > mpi-21 mailing list > > mpi-21_at_[hidden] > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-21 > > > _______________________________________________ > mpi-21 mailing list > mpi-21_at_[hidden] > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-21 Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner_at_[hidden] High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530 University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832 Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner Nobelstr. 19, D-70550 Stuttgart, Germany . (Office: Allmandring 30) From wgropp at [hidden] Thu Apr 10 10:19:36 2008 From: wgropp at [hidden] (William Gropp) Date: Thu, 10 Apr 2008 10:19:36 -0500 Subject: [Mpi-22] Call for agenda items for MPI 2.2 at the next MPI Forum Meeting Message-ID: <6EBBF54A-5316-43EE-BFEB-599E910B63A2@uiuc.edu> Please let me know if you have agenda items for MPI 2.2. To date, we have Remove Send Buffer Access Restriction http://svn.mpi-forum.org/trac/mpi-forum-web/wiki/SendBufferAccess Adding const Correctness to the C Bindings http://svn.mpi-forum.org/trac/mpi-forum-web/wiki/ConstCorrectness Miscellaneous items moved to 2.2 from the 2.1 discussions. I've also received a heads up about issues with globalization and making MPI APIs secure (this latter is probably an MPI-3 item). If those are ready, they can be added. Bill William Gropp Paul and Cynthia Saylor Professor of Computer Science University of Illinois Urbana-Champaign From jsquyres at [hidden] Thu Apr 10 12:57:32 2008 From: jsquyres at [hidden] (Jeff Squyres) Date: Thu, 10 Apr 2008 10:57:32 -0700 Subject: [Mpi-22] Call for agenda items for MPI 2.2 at the next MPI Forum Meeting In-Reply-To: <6EBBF54A-5316-43EE-BFEB-599E910B63A2@uiuc.edu> Message-ID: On Apr 10, 2008, at 8:19 AM, William Gropp wrote: > Miscellaneous items moved to 2.2 from the 2.1 discussions. Just to make sure, does "miscellaneous items" include: - make all MPI C++ predefined handles be const - a bunch of missing "const"s for parameters and methods in C++ bindings (the list I sent around recently) - IN / OUT / INOUT definitions and usage -- Jeff Squyres Cisco Systems From wgropp at [hidden] Fri Apr 11 09:38:38 2008 From: wgropp at [hidden] (William Gropp) Date: Fri, 11 Apr 2008 09:38:38 -0500 Subject: [Mpi-22] Previous messages Message-ID: <374DA599-BDA6-42F8-9385-5F78CFB327E5@uiuc.edu> For some reason, I wasn't a member of the mpi-22 list, and the archives are empty. If you sent a message to the mpi-22 list, please send me a copy. Thanks! Bill William Gropp Paul and Cynthia Saylor Professor of Computer Science University of Illinois Urbana-Champaign * -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsquyres at [hidden] Fri Apr 11 09:59:20 2008 From: jsquyres at [hidden] (Jeff Squyres) Date: Fri, 11 Apr 2008 07:59:20 -0700 Subject: [Mpi-22] Previous messages In-Reply-To: <374DA599-BDA6-42F8-9385-5F78CFB327E5@uiuc.edu> Message-ID: <7D76988A-2E5D-465A-B8B5-8B96B703C213@cisco.com> I'm sorry, the reason that the web-ified archives are empty is my fault. All the messages *are* being collected (in an mbox file), but the web archives are not being populated yet. IU's sysadmins are waiting for some information from me; I have not had time yet to provide it to them. I can have them send you the mbox. On Apr 11, 2008, at 7:38 AM, William Gropp wrote: > For some reason, I wasn't a member of the mpi-22 list, and the > archives are empty. If you sent a message to the mpi-22 list, > please send me a copy. Thanks! > > Bill > > William Gropp > Paul and Cynthia Saylor Professor of Computer Science > University of Illinois Urbana-Champaign > > > > _______________________________________________ > mpi-22 mailing list > mpi-22_at_[hidden] > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-22 -- Jeff Squyres Cisco Systems From jsquyres at [hidden] Tue Apr 15 09:49:23 2008 From: jsquyres at [hidden] (Jeff Squyres) Date: Tue, 15 Apr 2008 10:49:23 -0400 Subject: [Mpi-22] Previous messages In-Reply-To: <7D76988A-2E5D-465A-B8B5-8B96B703C213@cisco.com> Message-ID: <366D7A27-4E0E-4C93-886C-FCAC7F933942@cisco.com> Bill -- Did you get the mbox that was sent (off list)? The web archives for mpi-21 and mpi-22 are now up on a trial basis (see http://lists.mpi-forum.org/) . Once we get these right, we'll put the rest of the lists up as well. On Apr 11, 2008, at 10:59 AM, Jeff Squyres wrote: > I'm sorry, the reason that the web-ified archives are empty is my > fault. All the messages *are* being collected (in an mbox file), but > the web archives are not being populated yet. IU's sysadmins are > waiting for some information from me; I have not had time yet to > provide it to them. > > I can have them send you the mbox. > > > On Apr 11, 2008, at 7:38 AM, William Gropp wrote: >> For some reason, I wasn't a member of the mpi-22 list, and the >> archives are empty. If you sent a message to the mpi-22 list, >> please send me a copy. Thanks! >> >> Bill >> >> William Gropp >> Paul and Cynthia Saylor Professor of Computer Science >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> mpi-22 mailing list >> mpi-22_at_[hidden] >> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-22 > > > -- > Jeff Squyres > Cisco Systems > > _______________________________________________ > mpi-22 mailing list > mpi-22_at_[hidden] > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-22 -- Jeff Squyres Cisco Systems From treumann at [hidden] Tue Apr 15 10:17:07 2008 From: treumann at [hidden] (Richard Treumann) Date: Tue, 15 Apr 2008 11:17:07 -0400 Subject: [Mpi-22] Previous messages In-Reply-To: <366D7A27-4E0E-4C93-886C-FCAC7F933942@cisco.com> Message-ID: Hi Jeff Is it possible to present the archive in a way that is not hidden under opaque day by day sub directories? It would be better if a thread could be followed across its life more easily or found even if I do not remember which day it began. (It seems like I never remember exactly what date anything happened.) Dick Dick Treumann - MPI Team/TCEM IBM Systems & Technology Group Dept 0lva / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601 Tele (845) 433-7846 Fax (845) 433-8363 mpi-22-bounces_at_[hidden] wrote on 04/15/2008 10:49:23 AM: > Bill -- > > Did you get the mbox that was sent (off list)? The web archives for > mpi-21 and mpi-22 are now up on a trial basis (see http://lists.mpi-forum.org/ > ) > . Once we get these right, we'll put the rest of the lists up as well. > > > > On Apr 11, 2008, at 10:59 AM, Jeff Squyres wrote: > > I'm sorry, the reason that the web-ified archives are empty is my > > fault. All the messages *are* being collected (in an mbox file), but > > the web archives are not being populated yet. IU's sysadmins are > > waiting for some information from me; I have not had time yet to > > provide it to them. > > > > I can have them send you the mbox. > > > > > > On Apr 11, 2008, at 7:38 AM, William Gropp wrote: > >> For some reason, I wasn't a member of the mpi-22 list, and the > >> archives are empty. If you sent a message to the mpi-22 list, > >> please send me a copy. Thanks! > >> > >> Bill > >> > >> William Gropp > >> Paul and Cynthia Saylor Professor of Computer Science > >> University of Illinois Urbana-Champaign > >> > >> > >> > >> _______________________________________________ > >> mpi-22 mailing list > >> mpi-22_at_[hidden] > >> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-22 > > > > > > -- > > Jeff Squyres > > Cisco Systems > > > > _______________________________________________ > > mpi-22 mailing list > > mpi-22_at_[hidden] > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-22 > > > -- > Jeff Squyres > Cisco Systems > > _______________________________________________ > mpi-22 mailing list > mpi-22_at_[hidden] > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-22 * -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsquyres at [hidden] Tue Apr 15 10:27:13 2008 From: jsquyres at [hidden] (Jeff Squyres) Date: Tue, 15 Apr 2008 11:27:13 -0400 Subject: [Mpi-22] Previous messages In-Reply-To: Message-ID: <59EB8C5D-74EE-47DA-B17E-6DDE5E9E2DF9@cisco.com> Is this view helpful ("thread" view for mpi-21 and mpi-22 lists): http://lists.mpi-forum.org/mpi-21/2008/04/index.php http://lists.mpi-forum.org/mpi-22/2008/04/index.php It's the "thread" view. It does separate by month, though. On Apr 15, 2008, at 11:17 AM, Richard Treumann wrote: > Hi Jeff > > Is it possible to present the archive in a way that is not hidden > under opaque day by day sub directories? It would be better if a > thread could be followed across its life more easily or found even > if I do not remember which day it began. (It seems like I never > remember exactly what date anything happened.) > > Dick > > Dick Treumann - MPI Team/TCEM > IBM Systems & Technology Group > Dept 0lva / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601 > Tele (845) 433-7846 Fax (845) 433-8363 > > > mpi-22-bounces_at_[hidden] wrote on 04/15/2008 10:49:23 AM: > > > Bill -- > > > > Did you get the mbox that was sent (off list)? The web archives for > > mpi-21 and mpi-22 are now up on a trial basis (see http://lists.mpi-forum.org/ > > ) > > . Once we get these right, we'll put the rest of the lists up as > well. > > > > > > > > On Apr 11, 2008, at 10:59 AM, Jeff Squyres wrote: > > > I'm sorry, the reason that the web-ified archives are empty is my > > > fault. All the messages *are* being collected (in an mbox > file), but > > > the web archives are not being populated yet. IU's sysadmins are > > > waiting for some information from me; I have not had time yet to > > > provide it to them. > > > > > > I can have them send you the mbox. > > > > > > > > > On Apr 11, 2008, at 7:38 AM, William Gropp wrote: > > >> For some reason, I wasn't a member of the mpi-22 list, and the > > >> archives are empty. If you sent a message to the mpi-22 list, > > >> please send me a copy. Thanks! > > >> > > >> Bill > > >> > > >> William Gropp > > >> Paul and Cynthia Saylor Professor of Computer Science > > >> University of Illinois Urbana-Champaign > > >> > > >> > > >> > > >> _______________________________________________ > > >> mpi-22 mailing list > > >> mpi-22_at_[hidden] > > >> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-22 > > > > > > > > > -- > > > Jeff Squyres > > > Cisco Systems > > > > > > _______________________________________________ > > > mpi-22 mailing list > > > mpi-22_at_[hidden] > > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-22 > > > > > > -- > > Jeff Squyres > > Cisco Systems > > > > _______________________________________________ > > mpi-22 mailing list > > mpi-22_at_[hidden] > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-22 > _______________________________________________ > mpi-22 mailing list > mpi-22_at_[hidden] > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-22 -- Jeff Squyres Cisco Systems From treumann at [hidden] Tue Apr 15 11:40:48 2008 From: treumann at [hidden] (Richard Treumann) Date: Tue, 15 Apr 2008 12:40:48 -0400 Subject: [Mpi-22] Previous messages In-Reply-To: <59EB8C5D-74EE-47DA-B17E-6DDE5E9E2DF9@cisco.com> Message-ID: Thanks Jeff I appreciate you being willing to do this work at all so I am reluctant to gripe. When I looked at MPI 2.2 I did not notice that the opaque directories are each a month, not a day. My mistake. I guess there is a trade off. If a list goes on for thousands of messages it is good to be able to filter by month. It does seem that finding the start of a thread can be hidden by this month by month structure. For example in Feb (http://lists.mpi-forum.org/mpi-22/2008/02/index.php) there is a one item thread with title "Re: [mpi-22] FW: [mpi-21] Proposal: MPI_OFFSET built-in type ". Most of this thread was in the prior month but that is not visible if you come across it in Feb. If you find the thread in the prior month (Jan), the forward links are there and the Feb comment will be found but if you first find it in the 02 folder you do not see that there is prior discussion. My real concern is about being able to find and follow a complete thread across its lifetime even if that lifetime is long or has big time gaps. It looks like following forward is there but finding the beginning or knowing you are not at the beginning can be more difficult. Is there a way to provide back links into prior months when using the thread view? Once we are doing MPI 3 and have threads that date back a year or more it could be more of a problem finding the whole thing. Dick Dick Treumann - MPI Team/TCEM IBM Systems & Technology Group Dept 0lva / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601 Tele (845) 433-7846 Fax (845) 433-8363 mpi-22-bounces_at_[hidden] wrote on 04/15/2008 11:27:13 AM: > Is this view helpful ("thread" view for mpi-21 and mpi-22 lists): > > http://lists.mpi-forum.org/mpi-21/2008/04/index.php > http://lists.mpi-forum.org/mpi-22/2008/04/index.php > > It's the "thread" view. It does separate by month, though. > > * -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsquyres at [hidden] Tue Apr 15 12:27:37 2008 From: jsquyres at [hidden] (Jeff Squyres) Date: Tue, 15 Apr 2008 13:27:37 -0400 Subject: [Mpi-22] Previous messages In-Reply-To: Message-ID: <449A628A-C694-4F22-9877-18F418AC8574@cisco.com> On Apr 15, 2008, at 12:40 PM, Richard Treumann wrote: > I appreciate you being willing to do this work at all so I am > reluctant to gripe. > No worries; it took me forever to get these up on the web (much longer than I anticipated). It's best when this is a useful mechanism for all, so comments are appreciated. > When I looked at MPI 2.2 I did not notice that the opaque > directories are each a month, not a day. My mistake. I guess there > is a trade off. If a list goes on for thousands of messages it is > good to be able to filter by month. > > It does seem that finding the start of a thread can be hidden by > this month by month structure. For example in Feb (http://lists.mpi-forum.org/mpi-22/2008/02/index.php > ) there is a one item thread with title "Re: [mpi-22] FW: [mpi-21] > Proposal: MPI_OFFSET built-in type ". Most of this thread was in the > prior month but that is not visible if you come across it in Feb. If > you find the thread in the prior month (Jan), the forward links are > there and the Feb comment will be found but if you first find it in > the 02 folder you do not see that there is prior discussion. > Yes, this is a definite trade-off. The hope is that the advanced search capabilities will make up for this tradeoff. E.g., you can search for "mpi_offset" in the subject, body, etc. The advanced search is on the front page of every list. The simple search box in the top right will do a simple search and then take you to the advanced search box. > My real concern is about being able to find and follow a complete > thread across its lifetime even if that lifetime is long or has big > time gaps. It looks like following forward is there but finding the > beginning or knowing you are not at the beginning can be more > difficult. Is there a way to provide back links into prior months > when using the thread view? Once we are doing MPI 3 and have threads > that date back a year or more it could be more of a problem finding > the whole thing. > FWIW, the archiver (software known as "hypermail") does do the back links from individual messages. For example, this message on the Open MPI devel list: http://www.open-mpi.org/community/lists/devel/2008/04/3598.php Was on April 1. The "In reply to" link to the previous message was on March 31: http://www.open-mpi.org/community/lists/devel/2008/03/3590.php It's not quite as seamless as a uniform thread view that spans months, but it does let you go backwards in time. Do you have a better suggestion for a UI? I'm assuming that the hypermail designers felt that this was a good tradeoff between a potentially-infinite thread / month view and being able to go backwards in time. -- Jeff Squyres Cisco Systems From wgropp at [hidden] Wed Apr 16 10:12:52 2008 From: wgropp at [hidden] (William Gropp) Date: Wed, 16 Apr 2008 10:12:52 -0500 Subject: [Mpi-22] Previous messages In-Reply-To: <366D7A27-4E0E-4C93-886C-FCAC7F933942@cisco.com> Message-ID: <378D7F22-2E2F-4B66-88DC-827FCF8FFD91@uiuc.edu> I did get it, but I hadn't had a chance to unpack it into a usable form. The web archives are a big help, thanks! Bill On Apr 15, 2008, at 9:49 AM, Jeff Squyres wrote: > Bill -- > > Did you get the mbox that was sent (off list)? The web archives for > mpi-21 and mpi-22 are now up on a trial basis (see http://lists.mpi- > forum.org/) > . Once we get these right, we'll put the rest of the lists up as > well. > William Gropp Paul and Cynthia Saylor Professor of Computer Science University of Illinois Urbana-Champaign * -------------- next part -------------- An HTML attachment was scrubbed... URL: From treumann at [hidden] Tue Apr 22 12:38:03 2008 From: treumann at [hidden] (Richard Treumann) Date: Tue, 22 Apr 2008 13:38:03 -0400 Subject: [Mpi-22] Another pre-preposal for MPI 2.2 or 3.0 Message-ID: I have a proposal for providing information to the MPI implementation at MPI_INIT time to allow certain optimizations within the run. This is not a "hints" mechanism because it does change the semantic rules for MPI in the job run. A correct "vanilla" MPI application could give different results or fail if faulty information is provided. I am interested in what the Forum members think about this idea before I try to formalize it. I will state up front that I am a skeptic about most of the MPI Subset goals I hear described. However, I think this is a form of subsetting I would support. I say "I think" because it is possible we will find serious complexities that would make me back away.. If this looks as straightforward as I expect, perhaps we could look at it for MPI 2.2. The most basic valid implementation of this is a small amount of work for an implementer. (Well within the scope of MPI 2.2 effort / policy) ========================================================================================== The MPI standard has a number of thorny semantic requirements that a typical program does not depend on and that an MPI implementation may pay a performance penalty by guaranteeing. A standards defined mechanism which allows the application to explicitly let libmpi off the hook at MPI_Init time on the ones it does not depend on may allow better performance in some cases. This would be an "assert" rather than a "hints" mechanism because it would be valid for an MPI implementation to fail a job that depends on an MPI feature but lets libmpi off the hook on it at the MPI_Init call In most, but not all, of these cases the MPI implementation could easily give an error message if the application did something it had promised not to do. Here is a partial list of sometimes troublesome semantic requirements. 1) MPI_CANCEL on MPI_ISEND probably cannot be correctly supported without adding a message ID to every message sent. Using space in the message header adds cost.and may be a complete waste for an application that never tries to cancel an ISEND. (If there is a cost for being prepared to cancel an MPI_RECV we could cover that too) 2) MPI_Datatypes that define a contiguous buffer can be optimized if it is known that there will never be a need to translate the data between heterogeneous nodes. An array of structures, where each structure is a MPI_INT followed by an MPI_FLOAT is likely to be contiguous. An MPI_SEND of count==100 can bypass the datatype engine and be treated as a send of 800 bytes if the destination has the same data representations. An MPI implementation that "knows" it will not need to deal with data conversion can simplify the datatype commit and internal representation by discarding the MPI_INT/MPI_FLOAT data and just recording that the type is 8 bytes with a stride of 8. 3) The MPI standard either requires or strongly urges that an MPI_REDUCE/MPI_ALLREDUCE give exactly the same answer every time. It is not clear to me what that means. If it means a portable MPI like MPICH or OpenMPI must give the same answer whether run on an Intel cluster,an IBM Power cluster or a BlueGene then I would bet no MPI in the world complies. If it means Version 5 of an MPI must give the same answer Version 1 did, it would prevent new algorithms. However, if it means that two "equivalent" reductions in a single application run must agree then perhaps most MPIs comply. Whatever it means, there are applications that do not need any "same answer" promise as long at they can assume they will get a "correct" answer. Maybe they can be provided a faster reduction algorithm. 4) MPI supports persistent send/recv which could allow some optimizations in which half rendezvous, pinned memory for RDMA, knowledge that both sides are contiguous buffers etc can be leveraged. The ability to do this is damaged by the fact that the standard requires a persistent send to match a normal receive and a normal send to match a persistent receive. The MPI implementation cannot make any assumptions that a matching send_init and recv_init can be bound together. 5) Perhaps MPI pt2pt communication could use a half rendezvous protocol if it were certain no receive would use MPI_ANY_SOURCE. If all receives will use an explicit source then libmpi can have the receive side send a notice to the send side that a receive is waiting. There is no need for the send side to ship the envelop and wait for a reply that the match is found. If MPI_ANY_SOURCE is possible then the send side must always start the transaction. (I am not aware of an issue with MPI_ANY_TAG but maybe somebody can think of one) 6) It may be that an MPI implementation that is ready to do a spawn or join must use a more complex matching/progress engine than it would need if it knew the set of connections & networks it had at MPI_Init could never be expanded. 7) The MPI standard allows a standard send to use an eager protocol but requires that libmpi promise every eager message can be buffered safely. The MPI implementation must fall back to rendezvous protocol when the promise can no longer be kept. This semantic can be expensive to maintain and produces serious scaling problems. Some applications depend on this semantic but many, especially those designed for massive scale, work in ways that ensure libmpi does not need to throttle eager sends. The applications pace themselves. 8) requirement that multi WAIT/TEST functions accept mixed arrays of MPI_Requests ( the multi WAIT/TEST routines may need special handling in case someone passes both Isend/Irecv requests and MPI_File_ixxx requests to the same MPI_Waitany for example) I bet applications seldom do this but is allowed and must work. 9) Would an application promise not to use MPI-IO allow any MPI to do an optimization? 10) Would an application promise not to use MPI-1sided allow any MPI to do an optimization? 11) What others have I not thought of at all? I want to make it clear that none of these MPI_Init time assertions should require an MPI implementation that provides the assert ready MPI_Init to work differently. For example, the user assertion that her application does not depend on a persistent send matching a normal receive or normal send matching a persistent receive does not require the MPI implementation to suppress such matches. It remains the users responsibility to create a program that will still work as expected on an MPI implementation that does not change its behavior for any specific assertion. For some of these it would not be possible for libmpi to detect that the user really is depending on something he told us we could shut off. The interface might look like this: int MPI_Init_thread_xxx(int *argc, char *((*argv)[]), int required, int *provided, int assertions) mpi.h would define constants like this: #define MPI_NO_SEND_CANCELS 0x00000001 #define MPI_NO_ANY_SOURCE 0x00000002 #define MPI_NO_REDUCE_CONSTRAINT 0x00000004 #define MPI_NO_DATATYPE_XLATE 0x00000010 #define MPI_NO_EAGER_THROTLE 0x00000020 etc The set of valid assertion flags would be specified by the standard as would be their precise meanings. It would always be valid for an application to pass 0 (zero) as the assertions argument. It would always be valid for an MPI implementation to ignore any or all assertions. With a 32 bit integer for assertions, we could define the interface in MPI 2.2 and add more assertions in MPI 3.0 if we wanted to. We could consider an 64 bit assert to keep the door open but I am pretty sure we can get by with 32 distinct assertions. A application call would look like: MPI_Init_thread_xxx( 0, 0, MPI_THREAD_MULTIPLE, &provided, MPI_NO_SEND_CANCELS | MPI_NO_ANY_SOURCE | MPI_NO_DATATYPE_XLATE); I am sorry I will not be at the next meeting to discuss in person but you can talk to Robert Blackmore. Dick Treumann Dick Treumann - MPI Team/TCEM IBM Systems & Technology Group Dept 0lva / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601 Tele (845) 433-7846 Fax (845) 433-8363 * -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsquyres at [hidden] Wed Apr 23 20:18:29 2008 From: jsquyres at [hidden] (Jeff Squyres) Date: Wed, 23 Apr 2008 21:18:29 -0400 Subject: [Mpi-22] Another pre-preposal for MPI 2.2 or 3.0 In-Reply-To: Message-ID: <724DB047-07E3-4A25-9116-04671BF0A723@cisco.com> I think that this is a generally good idea. As I understand it, you are stating that this is basically a bit stronger than "hints" -- the word "assertions" carries a bit more of a connotation that these are strict promises by the user. On Apr 22, 2008, at 1:38 PM, Richard Treumann wrote: > I have a proposal for providing information to the MPI > implementation at MPI_INIT time to allow certain optimizations > within the run. This is not a "hints" mechanism because it does > change the semantic rules for MPI in the job run. A correct > "vanilla" MPI application could give different results or fail if > faulty information is provided. > > I am interested in what the Forum members think about this idea > before I try to formalize it. > > I will state up front that I am a skeptic about most of the MPI > Subset goals I hear described. However, I think this is a form of > subsetting I would support. I say "I think" because it is possible > we will find serious complexities that would make me back away.. If > this looks as straightforward as I expect, perhaps we could look at > it for MPI 2.2. The most basic valid implementation of this is a > small amount of work for an implementer. (Well within the scope of > MPI 2.2 effort / policy) > > = > = > = > = > = > = > = > = > = > = > = > = > = > = > = > = > = > = > = > = > ====================================================================== > > The MPI standard has a number of thorny semantic requirements that a > typical program does not depend on and that an MPI implementation > may pay a performance penalty by guaranteeing. A standards defined > mechanism which allows the application to explicitly let libmpi off > the hook at MPI_Init time on the ones it does not depend on may > allow better performance in some cases. This would be an "assert" > rather than a "hints" mechanism because it would be valid for an MPI > implementation to fail a job that depends on an MPI feature but lets > libmpi off the hook on it at the MPI_Init call In most, but not all, > of these cases the MPI implementation could easily give an error > message if the application did something it had promised not to do. > > Here is a partial list of sometimes troublesome semantic requirements. > > 1) MPI_CANCEL on MPI_ISEND probably cannot be correctly supported > without adding a message ID to every message sent. Using space in > the message header adds cost.and may be a complete waste for an > application that never tries to cancel an ISEND. (If there is a cost > for being prepared to cancel an MPI_RECV we could cover that too) > > 2) MPI_Datatypes that define a contiguous buffer can be optimized if > it is known that there will never be a need to translate the data > between heterogeneous nodes. An array of structures, where each > structure is a MPI_INT followed by an MPI_FLOAT is likely to be > contiguous. An MPI_SEND of count==100 can bypass the datatype engine > and be treated as a send of 800 bytes if the destination has the > same data representations. An MPI implementation that "knows" it > will not need to deal with data conversion can simplify the datatype > commit and internal representation by discarding the MPI_INT/ > MPI_FLOAT data and just recording that the type is 8 bytes with a > stride of 8. > > 3) The MPI standard either requires or strongly urges that an > MPI_REDUCE/MPI_ALLREDUCE give exactly the same answer every time. It > is not clear to me what that means. If it means a portable MPI like > MPICH or OpenMPI must give the same answer whether run on an Intel > cluster,an IBM Power cluster or a BlueGene then I would bet no MPI > in the world complies. If it means Version 5 of an MPI must give the > same answer Version 1 did, it would prevent new algorithms. However, > if it means that two "equivalent" reductions in a single application > run must agree then perhaps most MPIs comply. Whatever it means, > there are applications that do not need any "same answer" promise as > long at they can assume they will get a "correct" answer. Maybe they > can be provided a faster reduction algorithm. > > 4) MPI supports persistent send/recv which could allow some > optimizations in which half rendezvous, pinned memory for RDMA, > knowledge that both sides are contiguous buffers etc can be > leveraged. The ability to do this is damaged by the fact that the > standard requires a persistent send to match a normal receive and a > normal send to match a persistent receive. The MPI implementation > cannot make any assumptions that a matching send_init and recv_init > can be bound together. > > 5) Perhaps MPI pt2pt communication could use a half rendezvous > protocol if it were certain no receive would use MPI_ANY_SOURCE. If > all receives will use an explicit source then libmpi can have the > receive side send a notice to the send side that a receive is > waiting. There is no need for the send side to ship the envelop and > wait for a reply that the match is found. If MPI_ANY_SOURCE is > possible then the send side must always start the transaction. (I am > not aware of an issue with MPI_ANY_TAG but maybe somebody can think > of one) > > 6) It may be that an MPI implementation that is ready to do a spawn > or join must use a more complex matching/progress engine than it > would need if it knew the set of connections & networks it had at > MPI_Init could never be expanded. > > 7) The MPI standard allows a standard send to use an eager protocol > but requires that libmpi promise every eager message can be buffered > safely. The MPI implementation must fall back to rendezvous protocol > when the promise can no longer be kept. This semantic can be > expensive to maintain and produces serious scaling problems. Some > applications depend on this semantic but many, especially those > designed for massive scale, work in ways that ensure libmpi does not > need to throttle eager sends. The applications pace themselves. > > 8) requirement that multi WAIT/TEST functions accept mixed arrays of > MPI_Requests ( the multi WAIT/TEST routines may need special > handling in case someone passes both Isend/Irecv requests and > MPI_File_ixxx requests to the same MPI_Waitany for example) I bet > applications seldom do this but is allowed and must work. > > 9) Would an application promise not to use MPI-IO allow any MPI to > do an optimization? > > 10) Would an application promise not to use MPI-1sided allow any MPI > to do an optimization? > > 11) What others have I not thought of at all? > > I want to make it clear that none of these MPI_Init time assertions > should require an MPI implementation that provides the assert ready > MPI_Init to work differently. For example, the user assertion that > her application does not depend on a persistent send matching a > normal receive or normal send matching a persistent receive does not > require the MPI implementation to suppress such matches. It remains > the users responsibility to create a program that will still work as > expected on an MPI implementation that does not change its behavior > for any specific assertion. > > For some of these it would not be possible for libmpi to detect that > the user really is depending on something he told us we could shut > off. > > The interface might look like this: > int MPI_Init_thread_xxx(int *argc, char *((*argv)[]), int required, > int *provided, int assertions) > > mpi.h would define constants like this: > > #define MPI_NO_SEND_CANCELS 0x00000001 > #define MPI_NO_ANY_SOURCE 0x00000002 > #define MPI_NO_REDUCE_CONSTRAINT 0x00000004 > #define MPI_NO_DATATYPE_XLATE 0x00000010 > #define MPI_NO_EAGER_THROTLE 0x00000020 > etc > > The set of valid assertion flags would be specified by the standard > as would be their precise meanings. It would always be valid for an > application to pass 0 (zero) as the assertions argument. It would > always be valid for an MPI implementation to ignore any or all > assertions. With a 32 bit integer for assertions, we could define > the interface in MPI 2.2 and add more assertions in MPI 3.0 if we > wanted to. We could consider an 64 bit assert to keep the door open > but I am pretty sure we can get by with 32 distinct assertions. > > > A application call would look like: MPI_Init_thread_xxx( 0, 0, > MPI_THREAD_MULTIPLE, &provided, > MPI_NO_SEND_CANCELS | MPI_NO_ANY_SOURCE | MPI_NO_DATATYPE_XLATE); > > I am sorry I will not be at the next meeting to discuss in person > but you can talk to Robert Blackmore. > > > > > Dick Treumann > Dick Treumann - MPI Team/TCEM > IBM Systems & Technology Group > Dept 0lva / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601 > Tele (845) 433-7846 Fax (845) 433-8363 > _______________________________________________ > mpi-22 mailing list > mpi-22_at_[hidden] > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-22 -- Jeff Squyres Cisco Systems From alexander.supalov at [hidden] Thu Apr 24 03:13:19 2008 From: alexander.supalov at [hidden] (Supalov, Alexander) Date: Thu, 24 Apr 2008 09:13:19 +0100 Subject: [Mpi-22] [Mpi-forum] Another pre-preposal for MPI 2.2 or 3.0 In-Reply-To: <724DB047-07E3-4A25-9116-04671BF0A723@cisco.com> Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A201466AA3@swsmsx413.ger.corp.intel.com> Hi, What happens if we run beyond 32 or 64 attributes? I think we may rather need something more scalable than an int, and possibly more hierarchical than a linear list of attributes. That would map into subsets nicely, by the way. Another thing is that in some cases, the attitude of the MPI for each attribute may be "yes", "no", and "don't care/undefined". I can imagine, for example, that there's no eager protocol at all, and so no throttle, albeit in a way different from when there are eager and rendezvous protocols, but they are well tuned to provide a smooth curve. What will happen in either case: will MPI proceed or terminate? By having attributes with values "yes", "no", "tell me" we may be able to accommodate this easier than with the bitwise "yes" and "no". Finally, we'll we treat thread support level as yet another attribute? Will we define any query function for these attributes? Will they be job-wide or communicator-wide? Best regards. Alexander -----Original Message----- From: mpi-forum-bounces_at_[hidden] [mailto:mpi-forum-bounces_at_[hidden]] On Behalf Of Jeff Squyres Sent: Thursday, April 24, 2008 3:18 AM To: MPI 2.2 Cc: mpi-forum_at_[hidden] Subject: Re: [Mpi-forum] [Mpi-22] Another pre-preposal for MPI 2.2 or 3.0 I think that this is a generally good idea. As I understand it, you are stating that this is basically a bit stronger than "hints" -- the word "assertions" carries a bit more of a connotation that these are strict promises by the user. On Apr 22, 2008, at 1:38 PM, Richard Treumann wrote: > I have a proposal for providing information to the MPI > implementation at MPI_INIT time to allow certain optimizations > within the run. This is not a "hints" mechanism because it does > change the semantic rules for MPI in the job run. A correct > "vanilla" MPI application could give different results or fail if > faulty information is provided. > > I am interested in what the Forum members think about this idea > before I try to formalize it. > > I will state up front that I am a skeptic about most of the MPI > Subset goals I hear described. However, I think this is a form of > subsetting I would support. I say "I think" because it is possible > we will find serious complexities that would make me back away.. If > this looks as straightforward as I expect, perhaps we could look at > it for MPI 2.2. The most basic valid implementation of this is a > small amount of work for an implementer. (Well within the scope of > MPI 2.2 effort / policy) > > = > = > = > = > = > = > = > = > = > = > = > = > = > = > = > = > = > = > = > = > ====================================================================== > > The MPI standard has a number of thorny semantic requirements that a > typical program does not depend on and that an MPI implementation > may pay a performance penalty by guaranteeing. A standards defined > mechanism which allows the application to explicitly let libmpi off > the hook at MPI_Init time on the ones it does not depend on may > allow better performance in some cases. This would be an "assert" > rather than a "hints" mechanism because it would be valid for an MPI > implementation to fail a job that depends on an MPI feature but lets > libmpi off the hook on it at the MPI_Init call In most, but not all, > of these cases the MPI implementation could easily give an error > message if the application did something it had promised not to do. > > Here is a partial list of sometimes troublesome semantic requirements. > > 1) MPI_CANCEL on MPI_ISEND probably cannot be correctly supported > without adding a message ID to every message sent. Using space in > the message header adds cost.and may be a complete waste for an > application that never tries to cancel an ISEND. (If there is a cost > for being prepared to cancel an MPI_RECV we could cover that too) > > 2) MPI_Datatypes that define a contiguous buffer can be optimized if > it is known that there will never be a need to translate the data > between heterogeneous nodes. An array of structures, where each > structure is a MPI_INT followed by an MPI_FLOAT is likely to be > contiguous. An MPI_SEND of count==100 can bypass the datatype engine > and be treated as a send of 800 bytes if the destination has the > same data representations. An MPI implementation that "knows" it > will not need to deal with data conversion can simplify the datatype > commit and internal representation by discarding the MPI_INT/ > MPI_FLOAT data and just recording that the type is 8 bytes with a > stride of 8. > > 3) The MPI standard either requires or strongly urges that an > MPI_REDUCE/MPI_ALLREDUCE give exactly the same answer every time. It > is not clear to me what that means. If it means a portable MPI like > MPICH or OpenMPI must give the same answer whether run on an Intel > cluster,an IBM Power cluster or a BlueGene then I would bet no MPI > in the world complies. If it means Version 5 of an MPI must give the > same answer Version 1 did, it would prevent new algorithms. However, > if it means that two "equivalent" reductions in a single application > run must agree then perhaps most MPIs comply. Whatever it means, > there are applications that do not need any "same answer" promise as > long at they can assume they will get a "correct" answer. Maybe they > can be provided a faster reduction algorithm. > > 4) MPI supports persistent send/recv which could allow some > optimizations in which half rendezvous, pinned memory for RDMA, > knowledge that both sides are contiguous buffers etc can be > leveraged. The ability to do this is damaged by the fact that the > standard requires a persistent send to match a normal receive and a > normal send to match a persistent receive. The MPI implementation > cannot make any assumptions that a matching send_init and recv_init > can be bound together. > > 5) Perhaps MPI pt2pt communication could use a half rendezvous > protocol if it were certain no receive would use MPI_ANY_SOURCE. If > all receives will use an explicit source then libmpi can have the > receive side send a notice to the send side that a receive is > waiting. There is no need for the send side to ship the envelop and > wait for a reply that the match is found. If MPI_ANY_SOURCE is > possible then the send side must always start the transaction. (I am > not aware of an issue with MPI_ANY_TAG but maybe somebody can think > of one) > > 6) It may be that an MPI implementation that is ready to do a spawn > or join must use a more complex matching/progress engine than it > would need if it knew the set of connections & networks it had at > MPI_Init could never be expanded. > > 7) The MPI standard allows a standard send to use an eager protocol > but requires that libmpi promise every eager message can be buffered > safely. The MPI implementation must fall back to rendezvous protocol > when the promise can no longer be kept. This semantic can be > expensive to maintain and produces serious scaling problems. Some > applications depend on this semantic but many, especially those > designed for massive scale, work in ways that ensure libmpi does not > need to throttle eager sends. The applications pace themselves. > > 8) requirement that multi WAIT/TEST functions accept mixed arrays of > MPI_Requests ( the multi WAIT/TEST routines may need special > handling in case someone passes both Isend/Irecv requests and > MPI_File_ixxx requests to the same MPI_Waitany for example) I bet > applications seldom do this but is allowed and must work. > > 9) Would an application promise not to use MPI-IO allow any MPI to > do an optimization? > > 10) Would an application promise not to use MPI-1sided allow any MPI > to do an optimization? > > 11) What others have I not thought of at all? > > I want to make it clear that none of these MPI_Init time assertions > should require an MPI implementation that provides the assert ready > MPI_Init to work differently. For example, the user assertion that > her application does not depend on a persistent send matching a > normal receive or normal send matching a persistent receive does not > require the MPI implementation to suppress such matches. It remains > the users responsibility to create a program that will still work as > expected on an MPI implementation that does not change its behavior > for any specific assertion. > > For some of these it would not be possible for libmpi to detect that > the user really is depending on something he told us we could shut > off. > > The interface might look like this: > int MPI_Init_thread_xxx(int *argc, char *((*argv)[]), int required, > int *provided, int assertions) > > mpi.h would define constants like this: > > #define MPI_NO_SEND_CANCELS 0x00000001 > #define MPI_NO_ANY_SOURCE 0x00000002 > #define MPI_NO_REDUCE_CONSTRAINT 0x00000004 > #define MPI_NO_DATATYPE_XLATE 0x00000010 > #define MPI_NO_EAGER_THROTLE 0x00000020 > etc > > The set of valid assertion flags would be specified by the standard > as would be their precise meanings. It would always be valid for an > application to pass 0 (zero) as the assertions argument. It would > always be valid for an MPI implementation to ignore any or all > assertions. With a 32 bit integer for assertions, we could define > the interface in MPI 2.2 and add more assertions in MPI 3.0 if we > wanted to. We could consider an 64 bit assert to keep the door open > but I am pretty sure we can get by with 32 distinct assertions. > > > A application call would look like: MPI_Init_thread_xxx( 0, 0, > MPI_THREAD_MULTIPLE, &provided, > MPI_NO_SEND_CANCELS | MPI_NO_ANY_SOURCE | MPI_NO_DATATYPE_XLATE); > > I am sorry I will not be at the next meeting to discuss in person > but you can talk to Robert Blackmore. > > > > > Dick Treumann > Dick Treumann - MPI Team/TCEM > IBM Systems & Technology Group > Dept 0lva / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601 > Tele (845) 433-7846 Fax (845) 433-8363 > _______________________________________________ > mpi-22 mailing list > mpi-22_at_[hidden] > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-22 -- Jeff Squyres Cisco Systems _______________________________________________ mpi-forum mailing list mpi-forum_at_[hidden] http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-forum --------------------------------------------------------------------- Intel GmbH Dornacher Strasse 1 85622 Feldkirchen/Muenchen Germany Sitz der Gesellschaft: Feldkirchen bei Muenchen Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer Registergericht: Muenchen HRB 47456 Ust.-IdNr. VAT Registration No.: DE129385895 Citibank Frankfurt (BLZ 502 109 00) 600119052 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. From koziol at [hidden] Thu Apr 24 06:13:47 2008 From: koziol at [hidden] (Quincey Koziol) Date: Thu, 24 Apr 2008 06:13:47 -0500 Subject: [Mpi-22] mpi-22 Digest, Vol 2, Issue 7 In-Reply-To: Message-ID: Hi Dick, On Apr 23, 2008, at 11:00 AM, mpi-22-request_at_[hidden] wrote: > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 22 Apr 2008 13:38:03 -0400 > From: Richard Treumann > Subject: [Mpi-22] Another pre-preposal for MPI 2.2 or 3.0 > To: "MPI 2.2" , > > Message-ID: > ON85257433.005C5199-85257433.0060DE25_at_[hidden]> > Content-Type: text/plain; charset="us-ascii" > > > > I have a proposal for providing information to the MPI > implementation at > MPI_INIT time to allow certain optimizations within the run. This > is not a > "hints" mechanism because it does change the semantic rules for MPI > in the > job run. A correct "vanilla" MPI application could give different > results > or fail if faulty information is provided. > > I am interested in what the Forum members think about this idea > before I > try to formalize it. > > I will state up front that I am a skeptic about most of the MPI Subset > goals I hear described. However, I think this is a form of > subsetting I > would support. I say "I think" because it is possible we will find > serious > complexities that would make me back away.. If this looks as > straightforward as I expect, perhaps we could look at it for MPI > 2.2. The > most basic valid implementation of this is a small amount of work > for an > implementer. (Well within the scope of MPI 2.2 effort / policy) I'm with you on being skeptical about the subsetting efforts and I'm also concerned about the proposal you have outlined. The reason I'm concerned about both ideas is that they both don't seem to take adequate account of how to handle third-party software libraries that use MPI calls. Even if the third-party library is open source, I don't think most users of those libraries are going to trace through the source code to make certain of what MPI features the library uses. (Plus those features can easily change over time). I suppose it's possible to ask third-party library providers to publish their "conformance" about which semantics can be relaxed, but I think that's going to be quite a burden for them. Quincey Koziol The HDF Group From jsquyres at [hidden] Thu Apr 24 06:50:08 2008 From: jsquyres at [hidden] (Jeff Squyres) Date: Thu, 24 Apr 2008 07:50:08 -0400 Subject: [Mpi-22] [Mpi-forum] Another pre-preposal for MPI 2.2 or 3.0 In-Reply-To: <5ECAB1304A8B5B4CB3F9D6C01E4E21A201466AA3@swsmsx413.ger.corp.intel.com> Message-ID: Good points. I'm also a little uncomfortable with just 32 attributes -- 32 seems like a big number right now, but we wouldn't want to be accused of only thinking of a world where you only need 640k of RAM. ;-) I would also like to keep the door open to implementation- specific attributes. The obvious arbitrary-storage candidate is MPI_Info, but to be able to set this stuff during MPI_INIT means that the Info functions have to be available before MPI_INIT (I think this came up before). Also, it might be worthwhile to have the MPI return the set of assertions that it was / was not able to support in some kind of definitive way, so that you can know that MPI X *supports* assertion Y, whereas MPI A *doesn't care* about assertion B, etc. -- similar to how the thread level is returned now. On Apr 24, 2008, at 4:13 AM, Supalov, Alexander wrote: > Hi, > > What happens if we run beyond 32 or 64 attributes? I think we may > rather > need something more scalable than an int, and possibly more > hierarchical > than a linear list of attributes. That would map into subsets > nicely, by > the way. > > Another thing is that in some cases, the attitude of the MPI for each > attribute may be "yes", "no", and "don't care/undefined". I can > imagine, > for example, that there's no eager protocol at all, and so no > throttle, > albeit in a way different from when there are eager and rendezvous > protocols, but they are well tuned to provide a smooth curve. What > will > happen in either case: will MPI proceed or terminate? By having > attributes with values "yes", "no", "tell me" we may be able to > accommodate this easier than with the bitwise "yes" and "no". > > Finally, we'll we treat thread support level as yet another attribute? > Will we define any query function for these attributes? Will they be > job-wide or communicator-wide? > > Best regards. > > Alexander > > -----Original Message----- > From: mpi-forum-bounces_at_[hidden] > [mailto:mpi-forum-bounces_at_[hidden]] On Behalf Of Jeff > Squyres > Sent: Thursday, April 24, 2008 3:18 AM > To: MPI 2.2 > Cc: mpi-forum_at_[hidden] > Subject: Re: [Mpi-forum] [Mpi-22] Another pre-preposal for MPI 2.2 or > 3.0 > > I think that this is a generally good idea. > > As I understand it, you are stating that this is basically a bit > stronger than "hints" -- the word "assertions" carries a bit more of a > connotation that these are strict promises by the user. > > > On Apr 22, 2008, at 1:38 PM, Richard Treumann wrote: > >> I have a proposal for providing information to the MPI >> implementation at MPI_INIT time to allow certain optimizations >> within the run. This is not a "hints" mechanism because it does >> change the semantic rules for MPI in the job run. A correct >> "vanilla" MPI application could give different results or fail if >> faulty information is provided. >> >> I am interested in what the Forum members think about this idea >> before I try to formalize it. >> >> I will state up front that I am a skeptic about most of the MPI >> Subset goals I hear described. However, I think this is a form of >> subsetting I would support. I say "I think" because it is possible >> we will find serious complexities that would make me back away.. If >> this looks as straightforward as I expect, perhaps we could look at >> it for MPI 2.2. The most basic valid implementation of this is a >> small amount of work for an implementer. (Well within the scope of >> MPI 2.2 effort / policy) >> >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> = >> ===================================================================== >> >> The MPI standard has a number of thorny semantic requirements that a >> typical program does not depend on and that an MPI implementation >> may pay a performance penalty by guaranteeing. A standards defined >> mechanism which allows the application to explicitly let libmpi off >> the hook at MPI_Init time on the ones it does not depend on may >> allow better performance in some cases. This would be an "assert" >> rather than a "hints" mechanism because it would be valid for an MPI >> implementation to fail a job that depends on an MPI feature but lets >> libmpi off the hook on it at the MPI_Init call In most, but not all, >> of these cases the MPI implementation could easily give an error >> message if the application did something it had promised not to do. >> >> Here is a partial list of sometimes troublesome semantic >> requirements. >> >> 1) MPI_CANCEL on MPI_ISEND probably cannot be correctly supported >> without adding a message ID to every message sent. Using space in >> the message header adds cost.and may be a complete waste for an >> application that never tries to cancel an ISEND. (If there is a cost >> for being prepared to cancel an MPI_RECV we could cover that too) >> >> 2) MPI_Datatypes that define a contiguous buffer can be optimized if >> it is known that there will never be a need to translate the data >> between heterogeneous nodes. An array of structures, where each >> structure is a MPI_INT followed by an MPI_FLOAT is likely to be >> contiguous. An MPI_SEND of count==100 can bypass the datatype engine >> and be treated as a send of 800 bytes if the destination has the >> same data representations. An MPI implementation that "knows" it >> will not need to deal with data conversion can simplify the datatype >> commit and internal representation by discarding the MPI_INT/ >> MPI_FLOAT data and just recording that the type is 8 bytes with a >> stride of 8. >> >> 3) The MPI standard either requires or strongly urges that an >> MPI_REDUCE/MPI_ALLREDUCE give exactly the same answer every time. It >> is not clear to me what that means. If it means a portable MPI like >> MPICH or OpenMPI must give the same answer whether run on an Intel >> cluster,an IBM Power cluster or a BlueGene then I would bet no MPI >> in the world complies. If it means Version 5 of an MPI must give the >> same answer Version 1 did, it would prevent new algorithms. However, >> if it means that two "equivalent" reductions in a single application >> run must agree then perhaps most MPIs comply. Whatever it means, >> there are applications that do not need any "same answer" promise as >> long at they can assume they will get a "correct" answer. Maybe they >> can be provided a faster reduction algorithm. >> >> 4) MPI supports persistent send/recv which could allow some >> optimizations in which half rendezvous, pinned memory for RDMA, >> knowledge that both sides are contiguous buffers etc can be >> leveraged. The ability to do this is damaged by the fact that the >> standard requires a persistent send to match a normal receive and a >> normal send to match a persistent receive. The MPI implementation >> cannot make any assumptions that a matching send_init and recv_init >> can be bound together. >> >> 5) Perhaps MPI pt2pt communication could use a half rendezvous >> protocol if it were certain no receive would use MPI_ANY_SOURCE. If >> all receives will use an explicit source then libmpi can have the >> receive side send a notice to the send side that a receive is >> waiting. There is no need for the send side to ship the envelop and >> wait for a reply that the match is found. If MPI_ANY_SOURCE is >> possible then the send side must always start the transaction. (I am >> not aware of an issue with MPI_ANY_TAG but maybe somebody can think >> of one) >> >> 6) It may be that an MPI implementation that is ready to do a spawn >> or join must use a more complex matching/progress engine than it >> would need if it knew the set of connections & networks it had at >> MPI_Init could never be expanded. >> >> 7) The MPI standard allows a standard send to use an eager protocol >> but requires that libmpi promise every eager message can be buffered >> safely. The MPI implementation must fall back to rendezvous protocol >> when the promise can no longer be kept. This semantic can be >> expensive to maintain and produces serious scaling problems. Some >> applications depend on this semantic but many, especially those >> designed for massive scale, work in ways that ensure libmpi does not >> need to throttle eager sends. The applications pace themselves. >> >> 8) requirement that multi WAIT/TEST functions accept mixed arrays of >> MPI_Requests ( the multi WAIT/TEST routines may need special >> handling in case someone passes both Isend/Irecv requests and >> MPI_File_ixxx requests to the same MPI_Waitany for example) I bet >> applications seldom do this but is allowed and must work. >> >> 9) Would an application promise not to use MPI-IO allow any MPI to >> do an optimization? >> >> 10) Would an application promise not to use MPI-1sided allow any MPI >> to do an optimization? >> >> 11) What others have I not thought of at all? >> >> I want to make it clear that none of these MPI_Init time assertions >> should require an MPI implementation that provides the assert ready >> MPI_Init to work differently. For example, the user assertion that >> her application does not depend on a persistent send matching a >> normal receive or normal send matching a persistent receive does not >> require the MPI implementation to suppress such matches. It remains >> the users responsibility to create a program that will still work as >> expected on an MPI implementation that does not change its behavior >> for any specific assertion. >> >> For some of these it would not be possible for libmpi to detect that >> the user really is depending on something he told us we could shut >> off. >> >> The interface might look like this: >> int MPI_Init_thread_xxx(int *argc, char *((*argv)[]), int required, >> int *provided, int assertions) >> >> mpi.h would define constants like this: >> >> #define MPI_NO_SEND_CANCELS 0x00000001 >> #define MPI_NO_ANY_SOURCE 0x00000002 >> #define MPI_NO_REDUCE_CONSTRAINT 0x00000004 >> #define MPI_NO_DATATYPE_XLATE 0x00000010 >> #define MPI_NO_EAGER_THROTLE 0x00000020 >> etc >> >> The set of valid assertion flags would be specified by the standard >> as would be their precise meanings. It would always be valid for an >> application to pass 0 (zero) as the assertions argument. It would >> always be valid for an MPI implementation to ignore any or all >> assertions. With a 32 bit integer for assertions, we could define >> the interface in MPI 2.2 and add more assertions in MPI 3.0 if we >> wanted to. We could consider an 64 bit assert to keep the door open >> but I am pretty sure we can get by with 32 distinct assertions. >> >> >> A application call would look like: MPI_Init_thread_xxx( 0, 0, >> MPI_THREAD_MULTIPLE, &provided, >> MPI_NO_SEND_CANCELS | MPI_NO_ANY_SOURCE | MPI_NO_DATATYPE_XLATE); >> >> I am sorry I will not be at the next meeting to discuss in person >> but you can talk to Robert Blackmore. >> >> >> >> >> Dick Treumann >> Dick Treumann - MPI Team/TCEM >> IBM Systems & Technology Group >> Dept 0lva / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601 >> Tele (845) 433-7846 Fax (845) 433-8363 >> _______________________________________________ >> mpi-22 mailing list >> mpi-22_at_[hidden] >> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-22 > > > -- > Jeff Squyres > Cisco Systems > > _______________________________________________ > mpi-forum mailing list > mpi-forum_at_[hidden] > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-forum > --------------------------------------------------------------------- > Intel GmbH > Dornacher Strasse 1 > 85622 Feldkirchen/Muenchen Germany > Sitz der Gesellschaft: Feldkirchen bei Muenchen > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer > Registergericht: Muenchen HRB 47456 Ust.-IdNr. > VAT Registration No.: DE129385895 > Citibank Frankfurt (BLZ 502 109 00) 600119052 > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > > > _______________________________________________ > mpi-forum mailing list > mpi-forum_at_[hidden] > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-forum -- Jeff Squyres Cisco Systems From treumann at [hidden] Thu Apr 24 09:25:44 2008 From: treumann at [hidden] (Richard Treumann) Date: Thu, 24 Apr 2008 10:25:44 -0400 Subject: [Mpi-22] [Mpi-forum] Another pre-preposal for MPI 2.2 or 3.0 In-Reply-To: <5ECAB1304A8B5B4CB3F9D6C01E4E21A201466AA3@swsmsx413.ger.corp.intel.com> Message-ID: I am aiming for a balance between simplicity (which leads to affordabe implementation in libmpi and practical use by applications & libraries) and versitility. If we standardize something well defined and affordable that gives 95% of the value and both MPI implementations and MPI applications/libraries begin to support/apply it we come out way ahead. Assertions even have a good probability of being portable if there are only a dozen defined. If we provide unbounded permutations and extensibility, most MPI implementations will ignore all but a handfull and the application developer will need to invest a lot of effort in setting switches without being able to assume they are ever read by the MPI implementation. Dick Treumann - MPI Team/TCEM IBM Systems & Technology Group Dept 0lva / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601 Tele (845) 433-7846 Fax (845) 433-8363 mpi-22-bounces_at_[hidden] wrote on 04/24/2008 04:13:19 AM: > Hi, > > What happens if we run beyond 32 or 64 attributes? I think we may rather > need something more scalable than an int, and possibly more hierarchical > than a linear list of attributes. That would map into subsets nicely, by > the way. I avoided the word "attribute" and chose the word "assertion" for a reason. I would consider the word "promise" except that it feels a bit anthropomorphic for my taste. An assertion is a statement by the application that it acts in a way which does not depend on a specific guarantee in the vanilla standard. An assertion is not a directive to libmpi to do something different. It is a promise that the application will be OK if libmpi passes up support for the specific semantic requirement. Libmpi is within its rights to terminate a job if libmpi can recognize the application "lied". Libmpi is even within its rights to give unexpected results if the application "lied". For example, if the application really does depend on bitwise reproducable reduction results and asserts it does not, the applicaton may give some surprises. My feeling is that no matter what we do there will never be more than a handfull of assertions that gain wide support. My fundamental concern with the subsetting concept is my suspicion that 1) it will end of with 100 or 1000 or 1000000 permutations, 2) supporting all of them would give 100 units of value and be very complex 3) an MPI implementation that tries to support a large number becomes untestable 4) a well chosen subset would give 95 units of value 5) consensus on the worthwhile aspects of subsetting is needed before you get portabality and that will take years to evolve. (maybe forever) 6) writing pluggable libraries will become much harder because each library will need to deal with the wide range of "subsets" somebody may plug it into. > > Another thing is that in some cases, the attitude of the MPI for each > attribute may be "yes", "no", and "don't care/undefined". I can imagine, > for example, that there's no eager protocol at all, and so no throttle, > albeit in a way different from when there are eager and rendezvous > protocols, but they are well tuned to provide a smooth curve. What will > happen in either case: will MPI proceed or terminate? By having > attributes with values "yes", "no", "tell me" we may be able to > accommodate this easier than with the bitwise "yes" and "no". Most applications will either depend on a semantic guarentee or will not. That may not always be easy for the application writer to recognize but there is no "dont' care" needed in this proposal. I suppose someone might ask "What if the application wants to provide dual code and let the MPI implementation decide?" That would call for a "don't care" option but it is not at all clear to me that MPI implementations would often have a basis for a run time decision to support a semantic guarentee that an application has said "don't care" for. If support for MPI_CANCEL hurts performance and the implementation has added logic to support CANCEL when the MPI_NO_SEND_CANCELS assertion is absent and give better performance when the MPI_NO_SEND_CANCELS assertion is provided, why would it ever consider supporting CANCEL in an application where the init time said "don't care"? > > Finally, we'll we treat thread support level as yet another attribute? I am open to considering this. > Will we define any query function for these attributes? Will they be > job-wide or communicator-wide? Assertions are job wide. A query mechanism seems like a reasonable addition and if the set of valid assertions is defined by the standard, a query mechanism would be pretty simple. I think the most useful query response would involve the implementation saying whether it is acting on the assertion but I could argue for a query that reports what the app has set. If I write an application and do not code a call to MPI_CANCEL I can assert MPI_NO_SEND_CANCELS but if my app calls an opaque library that uses MPI_CANCEL I may not know it does that. A well written library that depends on a semantic that can be suspended by assertion may want to have a way to check that the assertion was not made or at least not affecting libmpi behavior. The needs of opaque libraries is another argument for keeping the assertion list well defined. The library author must be able to predict which MPI guarentees can be pulled out from under him and that list must be short enough so as he writes the library code he can predict the spots where the ice may be thin and guard against them. The author of "Freds_lib" can use a query and has two options if he does not like the answer. He can issue a fatal error and tell the user: "Assertion MPI_NO_SEND_CANCELS is incompatable with using Freds_lib. Please remove this assertion" or he can provide an alternate code path that that does not depend on being able to cancel an MPI_Isend. > > Best regards. > > Alexander > > -----Original Message----- > From: mpi-forum-bounces_at_[hidden] > [mailto:mpi-forum-bounces_at_[hidden]] On Behalf Of Jeff Squyres > Sent: Thursday, April 24, 2008 3:18 AM > To: MPI 2.2 > Cc: mpi-forum_at_[hidden] > Subject: Re: [Mpi-forum] [Mpi-22] Another pre-preposal for MPI 2.2 or > 3.0 > > I think that this is a generally good idea. > > As I understand it, you are stating that this is basically a bit > stronger than "hints" -- the word "assertions" carries a bit more of a > connotation that these are strict promises by the user. > > > On Apr 22, 2008, at 1:38 PM, Richard Treumann wrote: > > > I have a proposal for providing information to the MPI > > implementation at MPI_INIT time to allow certain optimizations > > within the run. This is not a "hints" mechanism because it does > > change the semantic rules for MPI in the job run. A correct > > "vanilla" MPI application could give different results or fail if > > faulty information is provided. > > > > I am interested in what the Forum members think about this idea > > before I try to formalize it. > > > > I will state up front that I am a skeptic about most of the MPI > > Subset goals I hear described. However, I think this is a form of > > subsetting I would support. I say "I think" because it is possible > > we will find serious complexities that would make me back away.. If > > this looks as straightforward as I expect, perhaps we could look at > > it for MPI 2.2. The most basic valid implementation of this is a > > small amount of work for an implementer. (Well within the scope of > > MPI 2.2 effort / policy) > > > > = > > = > > = > > = > > = > > = > > = > > = > > = > > = > > = > > = > > = > > = > > = > > = > > = > > = > > = > > = > > ====================================================================== > > > > The MPI standard has a number of thorny semantic requirements that a > > typical program does not depend on and that an MPI implementation > > may pay a performance penalty by guaranteeing. A standards defined > > mechanism which allows the application to explicitly let libmpi off > > the hook at MPI_Init time on the ones it does not depend on may > > allow better performance in some cases. This would be an "assert" > > rather than a "hints" mechanism because it would be valid for an MPI > > implementation to fail a job that depends on an MPI feature but lets > > libmpi off the hook on it at the MPI_Init call In most, but not all, > > of these cases the MPI implementation could easily give an error > > message if the application did something it had promised not to do. > > > > Here is a partial list of sometimes troublesome semantic requirements. > > > > 1) MPI_CANCEL on MPI_ISEND probably cannot be correctly supported > > without adding a message ID to every message sent. Using space in > > the message header adds cost.and may be a complete waste for an > > application that never tries to cancel an ISEND. (If there is a cost > > for being prepared to cancel an MPI_RECV we could cover that too) > > > > 2) MPI_Datatypes that define a contiguous buffer can be optimized if > > it is known that there will never be a need to translate the data > > between heterogeneous nodes. An array of structures, where each > > structure is a MPI_INT followed by an MPI_FLOAT is likely to be > > contiguous. An MPI_SEND of count==100 can bypass the datatype engine > > and be treated as a send of 800 bytes if the destination has the > > same data representations. An MPI implementation that "knows" it > > will not need to deal with data conversion can simplify the datatype > > commit and internal representation by discarding the MPI_INT/ > > MPI_FLOAT data and just recording that the type is 8 bytes with a > > stride of 8. > > > > 3) The MPI standard either requires or strongly urges that an > > MPI_REDUCE/MPI_ALLREDUCE give exactly the same answer every time. It > > is not clear to me what that means. If it means a portable MPI like > > MPICH or OpenMPI must give the same answer whether run on an Intel > > cluster,an IBM Power cluster or a BlueGene then I would bet no MPI > > in the world complies. If it means Version 5 of an MPI must give the > > same answer Version 1 did, it would prevent new algorithms. However, > > if it means that two "equivalent" reductions in a single application > > run must agree then perhaps most MPIs comply. Whatever it means, > > there are applications that do not need any "same answer" promise as > > long at they can assume they will get a "correct" answer. Maybe they > > can be provided a faster reduction algorithm. > > > > 4) MPI supports persistent send/recv which could allow some > > optimizations in which half rendezvous, pinned memory for RDMA, > > knowledge that both sides are contiguous buffers etc can be > > leveraged. The ability to do this is damaged by the fact that the > > standard requires a persistent send to match a normal receive and a > > normal send to match a persistent receive. The MPI implementation > > cannot make any assumptions that a matching send_init and recv_init > > can be bound together. > > > > 5) Perhaps MPI pt2pt communication could use a half rendezvous > > protocol if it were certain no receive would use MPI_ANY_SOURCE. If > > all receives will use an explicit source then libmpi can have the > > receive side send a notice to the send side that a receive is > > waiting. There is no need for the send side to ship the envelop and > > wait for a reply that the match is found. If MPI_ANY_SOURCE is > > possible then the send side must always start the transaction. (I am > > not aware of an issue with MPI_ANY_TAG but maybe somebody can think > > of one) > > > > 6) It may be that an MPI implementation that is ready to do a spawn > > or join must use a more complex matching/progress engine than it > > would need if it knew the set of connections & networks it had at > > MPI_Init could never be expanded. > > > > 7) The MPI standard allows a standard send to use an eager protocol > > but requires that libmpi promise every eager message can be buffered > > safely. The MPI implementation must fall back to rendezvous protocol > > when the promise can no longer be kept. This semantic can be > > expensive to maintain and produces serious scaling problems. Some > > applications depend on this semantic but many, especially those > > designed for massive scale, work in ways that ensure libmpi does not > > need to throttle eager sends. The applications pace themselves. > > > > 8) requirement that multi WAIT/TEST functions accept mixed arrays of > > MPI_Requests ( the multi WAIT/TEST routines may need special > > handling in case someone passes both Isend/Irecv requests and > > MPI_File_ixxx requests to the same MPI_Waitany for example) I bet > > applications seldom do this but is allowed and must work. > > > > 9) Would an application promise not to use MPI-IO allow any MPI to > > do an optimization? > > > > 10) Would an application promise not to use MPI-1sided allow any MPI > > to do an optimization? > > > > 11) What others have I not thought of at all? > > > > I want to make it clear that none of these MPI_Init time assertions > > should require an MPI implementation that provides the assert ready > > MPI_Init to work differently. For example, the user assertion that > > her application does not depend on a persistent send matching a > > normal receive or normal send matching a persistent receive does not > > require the MPI implementation to suppress such matches. It remains > > the users responsibility to create a program that will still work as > > expected on an MPI implementation that does not change its behavior > > for any specific assertion. > > > > For some of these it would not be possible for libmpi to detect that > > the user really is depending on something he told us we could shut > > off. > > > > The interface might look like this: > > int MPI_Init_thread_xxx(int *argc, char *((*argv)[]), int required, > > int *provided, int assertions) > > > > mpi.h would define constants like this: > > > > #define MPI_NO_SEND_CANCELS 0x00000001 > > #define MPI_NO_ANY_SOURCE 0x00000002 > > #define MPI_NO_REDUCE_CONSTRAINT 0x00000004 > > #define MPI_NO_DATATYPE_XLATE 0x00000010 > > #define MPI_NO_EAGER_THROTLE 0x00000020 > > etc > > > > The set of valid assertion flags would be specified by the standard > > as would be their precise meanings. It would always be valid for an > > application to pass 0 (zero) as the assertions argument. It would > > always be valid for an MPI implementation to ignore any or all > > assertions. With a 32 bit integer for assertions, we could define > > the interface in MPI 2.2 and add more assertions in MPI 3.0 if we > > wanted to. We could consider an 64 bit assert to keep the door open > > but I am pretty sure we can get by with 32 distinct assertions. > > > > > > A application call would look like: MPI_Init_thread_xxx( 0, 0, > > MPI_THREAD_MULTIPLE, &provided, > > MPI_NO_SEND_CANCELS | MPI_NO_ANY_SOURCE | MPI_NO_DATATYPE_XLATE); > > > > I am sorry I will not be at the next meeting to discuss in person > > but you can talk to Robert Blackmore. > > > > > > > > > > Dick Treumann > > Dick Treumann - MPI Team/TCEM > > IBM Systems & Technology Group > > Dept 0lva / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601 > > Tele (845) 433-7846 Fax (845) 433-8363 > > _______________________________________________ > > mpi-22 mailing list > > mpi-22_at_[hidden] > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-22 > > > -- > Jeff Squyres > Cisco Systems > > _______________________________________________ > mpi-forum mailing list > mpi-forum_at_[hidden] > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-forum > --------------------------------------------------------------------- > Intel GmbH > Dornacher Strasse 1 > 85622 Feldkirchen/Muenchen Germany > Sitz der Gesellschaft: Feldkirchen bei Muenchen > Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer > Registergericht: Muenchen HRB 47456 Ust.-IdNr. > VAT Registration No.: DE129385895 > Citibank Frankfurt (BLZ 502 109 00) 600119052 > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > > > _______________________________________________ > mpi-22 mailing list > mpi-22_at_[hidden] > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-22 * -------------- next part -------------- An HTML attachment was scrubbed... URL: From treumann at [hidden] Thu Apr 24 09:34:23 2008 From: treumann at [hidden] (Richard Treumann) Date: Thu, 24 Apr 2008 10:34:23 -0400 Subject: [Mpi-22] [Mpi-forum] Another pre-preposal for MPI 2.2 or 3.0 In-Reply-To: Message-ID: Dick Treumann - MPI Team/TCEM IBM Systems & Technology Group Dept 0lva / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601 Tele (845) 433-7846 Fax (845) 433-8363 mpi-forum-bounces_at_[hidden] wrote on 04/24/2008 07:50:08 AM: > Good points. I'm also a little uncomfortable with just 32 attributes There are situations in standardization where deciding 32 bits is enough leads to very costly retrofit down the road because the implications of the decision become pervasive. IP addresses is a good example. In this case, the cost of being too conservative is that some day the MPI 6.0 Forum (which I hope not to be working on) will need to define a new MPI_Init call and deprecate the one that supports only 32 or 64 assertions. I am hopeful that we will never go beyond a dozen of so assertions. * -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.supalov at [hidden] Thu Apr 24 10:33:42 2008 From: alexander.supalov at [hidden] (Supalov, Alexander) Date: Thu, 24 Apr 2008 16:33:42 +0100 Subject: [Mpi-22] mpi-22 Digest, Vol 2, Issue 7 In-Reply-To: Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A201466E9F@swsmsx413.ger.corp.intel.com> Hi, Note that this is an argument for making the assertions optional: those who don't care don't have to use them. Those who care should use them correctly or else. As usual. Best regards. Alexander -----Original Message----- From: mpi-22-bounces_at_[hidden] [mailto:mpi-22-bounces_at_[hidden]] On Behalf Of Quincey Koziol Sent: Thursday, April 24, 2008 1:14 PM To: mpi-22_at_[hidden] Subject: Re: [Mpi-22] mpi-22 Digest, Vol 2, Issue 7 Hi Dick, On Apr 23, 2008, at 11:00 AM, mpi-22-request_at_[hidden] wrote: > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 22 Apr 2008 13:38:03 -0400 > From: Richard Treumann > Subject: [Mpi-22] Another pre-preposal for MPI 2.2 or 3.0 > To: "MPI 2.2" , > > Message-ID: > ON85257433.005C5199-85257433.0060DE25_at_[hidden]> > Content-Type: text/plain; charset="us-ascii" > > > > I have a proposal for providing information to the MPI > implementation at > MPI_INIT time to allow certain optimizations within the run. This > is not a > "hints" mechanism because it does change the semantic rules for MPI > in the > job run. A correct "vanilla" MPI application could give different > results > or fail if faulty information is provided. > > I am interested in what the Forum members think about this idea > before I > try to formalize it. > > I will state up front that I am a skeptic about most of the MPI Subset > goals I hear described. However, I think this is a form of > subsetting I > would support. I say "I think" because it is possible we will find > serious > complexities that would make me back away.. If this looks as > straightforward as I expect, perhaps we could look at it for MPI > 2.2. The > most basic valid implementation of this is a small amount of work > for an > implementer. (Well within the scope of MPI 2.2 effort / policy) I'm with you on being skeptical about the subsetting efforts and I'm also concerned about the proposal you have outlined. The reason I'm concerned about both ideas is that they both don't seem to take adequate account of how to handle third-party software libraries that use MPI calls. Even if the third-party library is open source, I don't think most users of those libraries are going to trace through the source code to make certain of what MPI features the library uses. (Plus those features can easily change over time). I suppose it's possible to ask third-party library providers to publish their "conformance" about which semantics can be relaxed, but I think that's going to be quite a burden for them. Quincey Koziol The HDF Group _______________________________________________ mpi-22 mailing list mpi-22_at_[hidden] http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-22 --------------------------------------------------------------------- Intel GmbH Dornacher Strasse 1 85622 Feldkirchen/Muenchen Germany Sitz der Gesellschaft: Feldkirchen bei Muenchen Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer Registergericht: Muenchen HRB 47456 Ust.-IdNr. VAT Registration No.: DE129385895 Citibank Frankfurt (BLZ 502 109 00) 600119052 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. From treumann at [hidden] Thu Apr 24 11:15:39 2008 From: treumann at [hidden] (Richard Treumann) Date: Thu, 24 Apr 2008 12:15:39 -0400 Subject: [Mpi-22] mpi-22 Digest, Vol 2, Issue 7 In-Reply-To: <5ECAB1304A8B5B4CB3F9D6C01E4E21A201466E9F@swsmsx413.ger.corp.intel.com> Message-ID: Dick Treumann - MPI Team/TCEM IBM Systems & Technology Group Dept 0lva / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601 Tele (845) 433-7846 Fax (845) 433-8363 mpi-22-bounces_at_[hidden] wrote on 04/24/2008 11:33:42 AM: > Hi, > > Note that this is an argument for making the assertions optional: those > who don't care don't have to use them. Those who care should use them > correctly or else. As usual. > > Best regards. > > Alexander > Hi Alexander The assertions are optional in this proposal. If this is added to the MPI standard the minimal impacts (day one impacts) are: == To application writers (none) - MPI_INIT and MPI_INIT_THREAD still work. MPI_INIT_THREAD_xxx can be passed 0 (zero) as the assertions bit vector. To MPI Implementors (small) - subroutine MPI_INIT_THREAD_xxx can be a clone of MPI_INIT_THREAD under the covers. If the Forum decides the query function is for asking what assertions are being honored, the implementation can just return "none" to every query. If there is also a query for what assertions have been made then there are a few more lines of code the implementor must write to preserve the value so it can be returned(maybe 10 lines) Writers of opaque libraries (small) - call the query function at library init time and if any assertions are found, issue an error message and kill the job. This is awkward for a library that wants to support every MPI whether it has implemented the new query function or not. == As MPI implementations begin to take advantage of assertions there is more work for the MPI implementor and the library author must begin to think about whether his customer will be upset if the library simply outlaws all assertions. The library author will never be wrong if he simply forbids assertions forever. If they become valuable he will feel the pressure to work it out. The MPI implementor will never be wrong if he adds the API but simply ignores assertions forever. If they become valuable he will feel the pressure to honor some at least. * -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.supalov at [hidden] Sat Apr 26 03:03:25 2008 From: alexander.supalov at [hidden] (Supalov, Alexander) Date: Sat, 26 Apr 2008 09:03:25 +0100 Subject: [Mpi-22] mpi-22 Digest, Vol 2, Issue 7 In-Reply-To: Message-ID: <5ECAB1304A8B5B4CB3F9D6C01E4E21A20148EF7A@swsmsx413.ger.corp.intel.com> Dear Dick, Thank you. Would you mind if I cite your proposal in the subsets discussion? Yours looks like a good alternative to the thinking of some of us that subsets might be very rich and mutable, and to Jeff's proposal on hints I've already cited there with his permission. Best regards. Alexander ________________________________ From: mpi-22-bounces_at_[hidden] [mailto:mpi-22-bounces_at_[hidden]] On Behalf Of Richard Treumann Sent: Thursday, April 24, 2008 6:16 PM To: MPI 2.2 Subject: Re: [Mpi-22] mpi-22 Digest, Vol 2, Issue 7 Dick Treumann - MPI Team/TCEM IBM Systems & Technology Group Dept 0lva / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601 Tele (845) 433-7846 Fax (845) 433-8363 mpi-22-bounces_at_[hidden] wrote on 04/24/2008 11:33:42 AM: > Hi, > > Note that this is an argument for making the assertions optional: those > who don't care don't have to use them. Those who care should use them > correctly or else. As usual. > > Best regards. > > Alexander > Hi Alexander The assertions are optional in this proposal. If this is added to the MPI standard the minimal impacts (day one impacts) are: == To application writers (none) - MPI_INIT and MPI_INIT_THREAD still work. MPI_INIT_THREAD_xxx can be passed 0 (zero) as the assertions bit vector. To MPI Implementors (small) - subroutine MPI_INIT_THREAD_xxx can be a clone of MPI_INIT_THREAD under the covers. If the Forum decides the query function is for asking what assertions are being honored, the implementation can just return "none" to every query. If there is also a query for what assertions have been made then there are a few more lines of code the implementor must write to preserve the value so it can be returned(maybe 10 lines) Writers of opaque libraries (small) - call the query function at library init time and if any assertions are found, issue an error message and kill the job. This is awkward for a library that wants to support every MPI whether it has implemented the new query function or not. == As MPI implementations begin to take advantage of assertions there is more work for the MPI implementor and the library author must begin to think about whether his customer will be upset if the library simply outlaws all assertions. The library author will never be wrong if he simply forbids assertions forever. If they become valuable he will feel the pressure to work it out. The MPI implementor will never be wrong if he adds the API but simply ignores assertions forever. If they become valuable he will feel the pressure to honor some at least. --------------------------------------------------------------------- Intel GmbH Dornacher Strasse 1 85622 Feldkirchen/Muenchen Germany Sitz der Gesellschaft: Feldkirchen bei Muenchen Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer Registergericht: Muenchen HRB 47456 Ust.-IdNr. VAT Registration No.: DE129385895 Citibank Frankfurt (BLZ 502 109 00) 600119052 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. * -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsquyres at [hidden] Tue Apr 29 13:11:11 2008 From: jsquyres at [hidden] (Jeff Squyres) Date: Tue, 29 Apr 2008 13:11:11 -0500 Subject: [Mpi-22] MPI_BOOL Message-ID: <008B27E5-F496-40F2-BF55-429977402520@cisco.com> Back in the mid-90's, there was no "bool" C type. But since C99, there has been. So we should have an MPI_BOOL type. The point was made to me today, however, that the C "bool" type and the C++ "bool" type may not be compatible -- so MPI_BOOL and MPI::BOOL may actually represent different things. (I don't propose a solution :-) -- I just bring up the topic to be considered for 2.2...) -- Jeff Squyres Cisco Systems