<html><body>
<p>I am aiming for a balance between simplicity (which leads to affordabe implementation in libmpi and practical use by applications & libraries) and versitility. If we standardize something well defined and affordable that gives 95% of the value and both MPI implementations and MPI applications/libraries begin to support/apply it we come out way ahead. Assertions even have a good probability of being portable if there are only a dozen defined. <br>
<br>
If we provide unbounded permutations and extensibility, most MPI implementations will ignore all but a handfull and the application developer will need to invest a lot of effort in setting switches without being able to assume they are ever read by the MPI implementation. <br>
<br>
Dick Treumann - MPI Team/TCEM <br>
IBM Systems & Technology Group<br>
Dept 0lva / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601<br>
Tele (845) 433-7846 Fax (845) 433-8363<br>
<br>
<br>
<tt>mpi-22-bounces@lists.mpi-forum.org wrote on 04/24/2008 04:13:19 AM:<br>
<br>
> Hi,<br>
> <br>
> What happens if we run beyond 32 or 64 attributes? I think we may rather<br>
> need something more scalable than an int, and possibly more hierarchical<br>
> than a linear list of attributes. That would map into subsets nicely, by<br>
> the way.</tt><br>
<tt>I avoided the word "attribute" and chose the word "assertion" for a reason.</tt><br>
<tt>I would consider the word "promise" except that it feels a bit </tt><br>
<tt>anthropomorphic for my taste.</tt><br>
<tt>An assertion is a statement by the application that it acts in a way which</tt><br>
<tt>does not depend on a specific guarantee in the vanilla standard. </tt><br>
<tt>An assertion is not a directive to libmpi to do something different. It </tt><br>
<tt>is a promise that the application will be OK if libmpi passes up support for</tt><br>
<tt>the specific semantic requirement. Libmpi is within its rights to terminate </tt><br>
<tt>a job if libmpi can recognize the application "lied". Libmpi is even within</tt><br>
<tt>its rights to give unexpected results if the application "lied". For example,</tt><br>
<tt>if the application really does depend on bitwise reproducable reduction </tt><br>
<tt>results and asserts it does not, the applicaton may give some surprises.</tt><br>
<br>
<tt>My feeling is that no matter what we do there will never be more than a </tt><br>
<tt>handfull of assertions that gain wide support. My fundamental concern with </tt><br>
<tt>the subsetting concept is my suspicion that </tt><br>
<tt>1) it will end of with 100 or 1000 or 1000000 permutations, </tt><br>
<tt>2) supporting all of them would give 100 units of value and be very complex</tt><br>
<tt>3) an MPI implementation that tries to support a large number becomes untestable</tt><br>
<tt>4) a well chosen subset would give 95 units of value</tt><br>
<tt>5) consensus on the worthwhile aspects of subsetting is needed before you get </tt><br>
<tt> portabality and that will take years to evolve. (maybe forever)</tt><br>
<tt>6) writing pluggable libraries will become much harder because each library</tt><br>
<tt> will need to deal with the wide range of "subsets" somebody may plug it </tt><br>
<tt> into.</tt><br>
<tt>> <br>
> Another thing is that in some cases, the attitude of the MPI for each<br>
> attribute may be "yes", "no", and "don't care/undefined". I can imagine,<br>
> for example, that there's no eager protocol at all, and so no throttle,<br>
> albeit in a way different from when there are eager and rendezvous<br>
> protocols, but they are well tuned to provide a smooth curve. What will<br>
> happen in either case: will MPI proceed or terminate? By having<br>
> attributes with values "yes", "no", "tell me" we may be able to<br>
> accommodate this easier than with the bitwise "yes" and "no".</tt><br>
<tt>Most applications will either depend on a semantic guarentee or will not. That </tt><br>
<tt>may not always be easy for the application writer to recognize but there is</tt><br>
<tt>no "dont' care" needed in this proposal. I suppose someone might ask "What if</tt><br>
<tt>the application wants to provide dual code and let the MPI implementation decide?"</tt><br>
<tt>That would call for a "don't care" option but it is not at all clear to me </tt><br>
<tt>that MPI implementations would often have a basis for a run time decision to </tt><br>
<tt>support a semantic guarentee that an application has said "don't care" for.</tt><br>
<tt>If support for MPI_CANCEL hurts performance and the implementation has added </tt><br>
<tt>logic to support CANCEL when the MPI_NO_SEND_CANCELS assertion is absent and give </tt><br>
<tt>better performance when the MPI_NO_SEND_CANCELS assertion is provided, why would</tt><br>
<tt>it ever consider supporting CANCEL in an application where the init time said</tt><br>
<tt>"don't care"? </tt><br>
<tt><br>
> <br>
> Finally, we'll we treat thread support level as yet another attribute?</tt><br>
<tt>I am open to considering this.<br>
> Will we define any query function for these attributes? Will they be<br>
> job-wide or communicator-wide?</tt><br>
<tt>Assertions are job wide. A query mechanism seems like a reasonable addition and</tt><br>
<tt>if the set of valid assertions is defined by the standard, a query mechanism </tt><br>
<tt>would be pretty simple. I think the most useful query response would involve the </tt><br>
<tt>implementation saying whether it is acting on the assertion but I could argue for</tt><br>
<tt>a query that reports what the app has set. If I write an application and do not </tt><br>
<tt>code a call to MPI_CANCEL I can assert MPI_NO_SEND_CANCELS but if my app calls an </tt><br>
<tt>opaque library that uses MPI_CANCEL I may not know it does that. </tt><br>
<tt>A well written library that depends on a semantic that can be suspended by assertion </tt><br>
<tt>may want to have a way to check that the assertion was not made or at least not </tt><br>
<tt>affecting libmpi behavior.</tt><br>
<br>
<tt>The needs of opaque libraries is another argument for keeping the assertion list</tt><br>
<tt>well defined. The library author must be able to predict which MPI guarentees can </tt><br>
<tt>be pulled out from under him and that list must be short enough so as he writes </tt><br>
<tt>the library code he can predict the spots where the ice may be thin and guard</tt><br>
<tt>against them. The author of "Freds_lib" can use a query and has two options if </tt><br>
<tt>he does not like the answer. He can issue a fatal error and tell the user:</tt><br>
<tt>"Assertion MPI_NO_SEND_CANCELS is incompatable with using Freds_lib. Please remove </tt><br>
<tt>this assertion" or he can provide an alternate code path that that does not </tt><br>
<tt>depend on being able to cancel an MPI_Isend.<br>
> <br>
> Best regards.<br>
> <br>
> Alexander <br>
> <br>
> -----Original Message-----<br>
> From: mpi-forum-bounces@lists.mpi-forum.org<br>
> [<a href="mailto:mpi-forum-bounces@lists.mpi-forum.org">mailto:mpi-forum-bounces@lists.mpi-forum.org</a>] On Behalf Of Jeff Squyres<br>
> Sent: Thursday, April 24, 2008 3:18 AM<br>
> To: MPI 2.2<br>
> Cc: mpi-forum@lists.mpi-forum.org<br>
> Subject: Re: [Mpi-forum] [Mpi-22] Another pre-preposal for MPI 2.2 or<br>
> 3.0<br>
> <br>
> I think that this is a generally good idea.<br>
> <br>
> As I understand it, you are stating that this is basically a bit <br>
> stronger than "hints" -- the word "assertions" carries a bit more of a <br>
> connotation that these are strict promises by the user.<br>
> <br>
> <br>
> On Apr 22, 2008, at 1:38 PM, Richard Treumann wrote:<br>
> <br>
> > I have a proposal for providing information to the MPI <br>
> > implementation at MPI_INIT time to allow certain optimizations <br>
> > within the run. This is not a "hints" mechanism because it does <br>
> > change the semantic rules for MPI in the job run. A correct <br>
> > "vanilla" MPI application could give different results or fail if <br>
> > faulty information is provided.<br>
> ><br>
> > I am interested in what the Forum members think about this idea <br>
> > before I try to formalize it.<br>
> ><br>
> > I will state up front that I am a skeptic about most of the MPI <br>
> > Subset goals I hear described. However, I think this is a form of <br>
> > subsetting I would support. I say "I think" because it is possible <br>
> > we will find serious complexities that would make me back away.. If <br>
> > this looks as straightforward as I expect, perhaps we could look at <br>
> > it for MPI 2.2. The most basic valid implementation of this is a <br>
> > small amount of work for an implementer. (Well within the scope of <br>
> > MPI 2.2 effort / policy)<br>
> ><br>
> > = <br>
> > = <br>
> > = <br>
> > = <br>
> > = <br>
> > = <br>
> > = <br>
> > = <br>
> > = <br>
> > = <br>
> > = <br>
> > = <br>
> > = <br>
> > = <br>
> > = <br>
> > = <br>
> > = <br>
> > = <br>
> > = <br>
> > = <br>
> > ======================================================================<br>
> ><br>
> > The MPI standard has a number of thorny semantic requirements that a <br>
> > typical program does not depend on and that an MPI implementation <br>
> > may pay a performance penalty by guaranteeing. A standards defined <br>
> > mechanism which allows the application to explicitly let libmpi off <br>
> > the hook at MPI_Init time on the ones it does not depend on may <br>
> > allow better performance in some cases. This would be an "assert" <br>
> > rather than a "hints" mechanism because it would be valid for an MPI <br>
> > implementation to fail a job that depends on an MPI feature but lets <br>
> > libmpi off the hook on it at the MPI_Init call In most, but not all, <br>
> > of these cases the MPI implementation could easily give an error <br>
> > message if the application did something it had promised not to do.<br>
> ><br>
> > Here is a partial list of sometimes troublesome semantic requirements.<br>
> ><br>
> > 1) MPI_CANCEL on MPI_ISEND probably cannot be correctly supported <br>
> > without adding a message ID to every message sent. Using space in <br>
> > the message header adds cost.and may be a complete waste for an <br>
> > application that never tries to cancel an ISEND. (If there is a cost <br>
> > for being prepared to cancel an MPI_RECV we could cover that too)<br>
> ><br>
> > 2) MPI_Datatypes that define a contiguous buffer can be optimized if <br>
> > it is known that there will never be a need to translate the data <br>
> > between heterogeneous nodes. An array of structures, where each <br>
> > structure is a MPI_INT followed by an MPI_FLOAT is likely to be <br>
> > contiguous. An MPI_SEND of count==100 can bypass the datatype engine <br>
> > and be treated as a send of 800 bytes if the destination has the <br>
> > same data representations. An MPI implementation that "knows" it <br>
> > will not need to deal with data conversion can simplify the datatype <br>
> > commit and internal representation by discarding the MPI_INT/ <br>
> > MPI_FLOAT data and just recording that the type is 8 bytes with a <br>
> > stride of 8.<br>
> ><br>
> > 3) The MPI standard either requires or strongly urges that an <br>
> > MPI_REDUCE/MPI_ALLREDUCE give exactly the same answer every time. It <br>
> > is not clear to me what that means. If it means a portable MPI like <br>
> > MPICH or OpenMPI must give the same answer whether run on an Intel <br>
> > cluster,an IBM Power cluster or a BlueGene then I would bet no MPI <br>
> > in the world complies. If it means Version 5 of an MPI must give the <br>
> > same answer Version 1 did, it would prevent new algorithms. However, <br>
> > if it means that two "equivalent" reductions in a single application <br>
> > run must agree then perhaps most MPIs comply. Whatever it means, <br>
> > there are applications that do not need any "same answer" promise as <br>
> > long at they can assume they will get a "correct" answer. Maybe they <br>
> > can be provided a faster reduction algorithm.<br>
> ><br>
> > 4) MPI supports persistent send/recv which could allow some <br>
> > optimizations in which half rendezvous, pinned memory for RDMA, <br>
> > knowledge that both sides are contiguous buffers etc can be <br>
> > leveraged. The ability to do this is damaged by the fact that the <br>
> > standard requires a persistent send to match a normal receive and a <br>
> > normal send to match a persistent receive. The MPI implementation <br>
> > cannot make any assumptions that a matching send_init and recv_init <br>
> > can be bound together.<br>
> ><br>
> > 5) Perhaps MPI pt2pt communication could use a half rendezvous <br>
> > protocol if it were certain no receive would use MPI_ANY_SOURCE. If <br>
> > all receives will use an explicit source then libmpi can have the <br>
> > receive side send a notice to the send side that a receive is <br>
> > waiting. There is no need for the send side to ship the envelop and <br>
> > wait for a reply that the match is found. If MPI_ANY_SOURCE is <br>
> > possible then the send side must always start the transaction. (I am <br>
> > not aware of an issue with MPI_ANY_TAG but maybe somebody can think <br>
> > of one)<br>
> ><br>
> > 6) It may be that an MPI implementation that is ready to do a spawn <br>
> > or join must use a more complex matching/progress engine than it <br>
> > would need if it knew the set of connections & networks it had at <br>
> > MPI_Init could never be expanded.<br>
> ><br>
> > 7) The MPI standard allows a standard send to use an eager protocol <br>
> > but requires that libmpi promise every eager message can be buffered <br>
> > safely. The MPI implementation must fall back to rendezvous protocol <br>
> > when the promise can no longer be kept. This semantic can be <br>
> > expensive to maintain and produces serious scaling problems. Some <br>
> > applications depend on this semantic but many, especially those <br>
> > designed for massive scale, work in ways that ensure libmpi does not <br>
> > need to throttle eager sends. The applications pace themselves.<br>
> ><br>
> > 8) requirement that multi WAIT/TEST functions accept mixed arrays of <br>
> > MPI_Requests ( the multi WAIT/TEST routines may need special <br>
> > handling in case someone passes both Isend/Irecv requests and <br>
> > MPI_File_ixxx requests to the same MPI_Waitany for example) I bet <br>
> > applications seldom do this but is allowed and must work.<br>
> ><br>
> > 9) Would an application promise not to use MPI-IO allow any MPI to <br>
> > do an optimization?<br>
> ><br>
> > 10) Would an application promise not to use MPI-1sided allow any MPI <br>
> > to do an optimization?<br>
> ><br>
> > 11) What others have I not thought of at all?<br>
> ><br>
> > I want to make it clear that none of these MPI_Init time assertions <br>
> > should require an MPI implementation that provides the assert ready <br>
> > MPI_Init to work differently. For example, the user assertion that <br>
> > her application does not depend on a persistent send matching a <br>
> > normal receive or normal send matching a persistent receive does not <br>
> > require the MPI implementation to suppress such matches. It remains <br>
> > the users responsibility to create a program that will still work as <br>
> > expected on an MPI implementation that does not change its behavior <br>
> > for any specific assertion.<br>
> ><br>
> > For some of these it would not be possible for libmpi to detect that <br>
> > the user really is depending on something he told us we could shut <br>
> > off.<br>
> ><br>
> > The interface might look like this:<br>
> > int MPI_Init_thread_xxx(int *argc, char *((*argv)[]), int required, <br>
> > int *provided, int assertions)<br>
> ><br>
> > mpi.h would define constants like this:<br>
> ><br>
> > #define MPI_NO_SEND_CANCELS 0x00000001<br>
> > #define MPI_NO_ANY_SOURCE 0x00000002<br>
> > #define MPI_NO_REDUCE_CONSTRAINT 0x00000004<br>
> > #define MPI_NO_DATATYPE_XLATE 0x00000010<br>
> > #define MPI_NO_EAGER_THROTLE 0x00000020<br>
> > etc<br>
> ><br>
> > The set of valid assertion flags would be specified by the standard <br>
> > as would be their precise meanings. It would always be valid for an <br>
> > application to pass 0 (zero) as the assertions argument. It would <br>
> > always be valid for an MPI implementation to ignore any or all <br>
> > assertions. With a 32 bit integer for assertions, we could define <br>
> > the interface in MPI 2.2 and add more assertions in MPI 3.0 if we <br>
> > wanted to. We could consider an 64 bit assert to keep the door open <br>
> > but I am pretty sure we can get by with 32 distinct assertions.<br>
> ><br>
> ><br>
> > A application call would look like: MPI_Init_thread_xxx( 0, 0, <br>
> > MPI_THREAD_MULTIPLE, &provided,<br>
> > MPI_NO_SEND_CANCELS | MPI_NO_ANY_SOURCE | MPI_NO_DATATYPE_XLATE);<br>
> ><br>
> > I am sorry I will not be at the next meeting to discuss in person <br>
> > but you can talk to Robert Blackmore.<br>
> ><br>
> ><br>
> ><br>
> ><br>
> > Dick Treumann<br>
> > Dick Treumann - MPI Team/TCEM<br>
> > IBM Systems & Technology Group<br>
> > Dept 0lva / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601<br>
> > Tele (845) 433-7846 Fax (845) 433-8363<br>
> > _______________________________________________<br>
> > mpi-22 mailing list<br>
> > mpi-22@lists.mpi-forum.org<br>
> > </tt><tt><a href="http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-22">http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-22</a></tt><tt><br>
> <br>
> <br>
> -- <br>
> Jeff Squyres<br>
> Cisco Systems<br>
> <br>
> _______________________________________________<br>
> mpi-forum mailing list<br>
> mpi-forum@lists.mpi-forum.org<br>
> </tt><tt><a href="http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-forum">http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-forum</a></tt><tt><br>
> ---------------------------------------------------------------------<br>
> Intel GmbH<br>
> Dornacher Strasse 1<br>
> 85622 Feldkirchen/Muenchen Germany<br>
> Sitz der Gesellschaft: Feldkirchen bei Muenchen<br>
> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer<br>
> Registergericht: Muenchen HRB 47456 Ust.-IdNr.<br>
> VAT Registration No.: DE129385895<br>
> Citibank Frankfurt (BLZ 502 109 00) 600119052<br>
> <br>
> This e-mail and any attachments may contain confidential material for<br>
> the sole use of the intended recipient(s). Any review or distribution<br>
> by others is strictly prohibited. If you are not the intended<br>
> recipient, please contact the sender and delete all copies.<br>
> <br>
> <br>
> _______________________________________________<br>
> mpi-22 mailing list<br>
> mpi-22@lists.mpi-forum.org<br>
> </tt><tt><a href="http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-22">http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-22</a></tt><tt><br>
</tt></body></html>