<html><body>

<p>I am aiming for a balance between simplicity (which leads to affordabe implementation in libmpi and practical use by applications & libraries) and versitility.  If we standardize something well defined and affordable that gives 95% of the value and both MPI implementations and MPI applications/libraries begin to support/apply it we come out way ahead.  Assertions even have a good probability of being portable if there are only a dozen defined. <br>

<br>

If we provide unbounded permutations and extensibility, most MPI implementations will ignore all but a handfull and the application developer will need to invest a lot of effort in setting switches without being able to assume they are ever read by the MPI implementation.   <br>

<br>

Dick Treumann  -  MPI Team/TCEM            <br>

IBM Systems & Technology Group<br>

Dept 0lva / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601<br>

Tele (845) 433-7846         Fax (845) 433-8363<br>

<br>

<br>

<tt>mpi-22-bounces@lists.mpi-forum.org wrote on 04/24/2008 04:13:19 AM:<br>

<br>

> Hi,<br>

> <br>

> What happens if we run beyond 32 or 64 attributes? I think we may rather<br>

> need something more scalable than an int, and possibly more hierarchical<br>

> than a linear list of attributes. That would map into subsets nicely, by<br>

> the way.</tt><br>

<tt>I avoided the word "attribute" and chose the word "assertion" for a reason.</tt><br>

<tt>I would consider the word "promise" except that it feels a bit </tt><br>

<tt>anthropomorphic for my taste.</tt><br>

<tt>An assertion is a statement by the application that it acts in a way which</tt><br>

<tt>does not depend on a specific guarantee in the vanilla standard. </tt><br>

<tt>An assertion is not a directive to libmpi to do something different. It </tt><br>

<tt>is a promise that the application will be OK if libmpi passes up support for</tt><br>

<tt>the specific semantic requirement.  Libmpi is within its rights to terminate </tt><br>

<tt>a job if libmpi can recognize the application "lied". Libmpi is even within</tt><br>

<tt>its rights to give unexpected results if the application "lied". For example,</tt><br>

<tt>if the application really does depend on bitwise  reproducable reduction </tt><br>

<tt>results and asserts it does not, the applicaton may give some surprises.</tt><br>

<br>

<tt>My feeling is that no matter what we do there will never be more than a </tt><br>

<tt>handfull of assertions that gain wide support. My fundamental concern with </tt><br>

<tt>the subsetting concept is my suspicion that </tt><br>

<tt>1) it will end of with 100 or 1000 or 1000000 permutations, </tt><br>

<tt>2) supporting all of them would give 100 units of value and be very complex</tt><br>

<tt>3) an MPI implementation that tries to support a large number becomes untestable</tt><br>

<tt>4) a well chosen subset would give 95 units of value</tt><br>

<tt>5) consensus on the worthwhile aspects of subsetting is needed before you get </tt><br>

<tt>   portabality and that will take years to evolve. (maybe forever)</tt><br>

<tt>6) writing pluggable libraries will become much harder because each library</tt><br>

<tt>   will need to deal with the wide range of "subsets" somebody may plug it </tt><br>

<tt>   into.</tt><br>

<tt>> <br>

> Another thing is that in some cases, the attitude of the MPI for each<br>

> attribute may be "yes", "no", and "don't care/undefined". I can imagine,<br>

> for example, that there's no eager protocol at all, and so no throttle,<br>

> albeit in a way different from when there are eager and rendezvous<br>

> protocols, but they are well tuned to provide a smooth curve. What will<br>

> happen in either case: will MPI proceed or terminate? By having<br>

> attributes with values "yes", "no", "tell me" we may be able to<br>

> accommodate this easier than with the bitwise "yes" and "no".</tt><br>

<tt>Most applications will either depend on a semantic guarentee or will not. That </tt><br>

<tt>may not always be easy for the application writer to recognize but there is</tt><br>

<tt>no "dont' care" needed in this proposal. I suppose someone might ask "What if</tt><br>

<tt>the application wants to provide dual code and let the MPI implementation decide?"</tt><br>

<tt>That would call for a "don't care" option but it is not at all clear to me </tt><br>

<tt>that MPI implementations would often have a basis for a run time decision to </tt><br>

<tt>support a semantic guarentee that an application has said "don't care" for.</tt><br>

<tt>If support for MPI_CANCEL hurts performance and the implementation has added </tt><br>

<tt>logic to support CANCEL when the MPI_NO_SEND_CANCELS assertion is absent and give </tt><br>

<tt>better performance when the MPI_NO_SEND_CANCELS assertion is provided, why would</tt><br>

<tt>it ever consider supporting CANCEL in an application where the init time said</tt><br>

<tt>"don't care"? </tt><br>

<tt><br>

> <br>

> Finally, we'll we treat thread support level as yet another attribute?</tt><br>

<tt>I am open to considering this.<br>

> Will we define any query function for these attributes? Will they be<br>

> job-wide or communicator-wide?</tt><br>

<tt>Assertions are job wide. A query mechanism seems like a reasonable addition and</tt><br>

<tt>if the set of valid assertions is defined by the standard, a query mechanism </tt><br>

<tt>would be pretty simple. I think the most useful query response would involve the </tt><br>

<tt>implementation saying whether it is acting on the assertion but I could argue for</tt><br>

<tt>a query that reports what the app has set. If I write an application and do not </tt><br>

<tt>code a call to MPI_CANCEL I can assert MPI_NO_SEND_CANCELS but if my app calls an </tt><br>

<tt>opaque library that uses MPI_CANCEL I may not know it does that. </tt><br>

<tt>A well written library that depends on a semantic that can be suspended by assertion </tt><br>

<tt>may want to have a way to check that the assertion was not made or at least not </tt><br>

<tt>affecting libmpi behavior.</tt><br>

<br>

<tt>The needs of opaque libraries is another argument for keeping the assertion list</tt><br>

<tt>well defined. The library author must be able to predict which MPI guarentees can </tt><br>

<tt>be pulled out from under him and that list must be short enough so as he writes </tt><br>

<tt>the library code he can predict the spots where the ice may be thin and guard</tt><br>

<tt>against them. The author of "Freds_lib" can use a query and has two options if </tt><br>

<tt>he does not like the answer. He can issue a fatal error and tell the user:</tt><br>

<tt>"Assertion MPI_NO_SEND_CANCELS is incompatable with using Freds_lib. Please remove </tt><br>

<tt>this assertion" or he can provide an alternate code path that that does not </tt><br>

<tt>depend on being able to cancel an MPI_Isend.<br>

> <br>

> Best regards.<br>

> <br>

> Alexander <br>

> <br>

> -----Original Message-----<br>

> From: mpi-forum-bounces@lists.mpi-forum.org<br>

> [<a href="mailto:mpi-forum-bounces@lists.mpi-forum.org">mailto:mpi-forum-bounces@lists.mpi-forum.org</a>] On Behalf Of Jeff Squyres<br>

> Sent: Thursday, April 24, 2008 3:18 AM<br>

> To: MPI 2.2<br>

> Cc: mpi-forum@lists.mpi-forum.org<br>

> Subject: Re: [Mpi-forum] [Mpi-22] Another pre-preposal for MPI 2.2 or<br>

> 3.0<br>

> <br>

> I think that this is a generally good idea.<br>

> <br>

> As I understand it, you are stating that this is basically a bit  <br>

> stronger than "hints" -- the word "assertions" carries a bit more of a  <br>

> connotation that these are strict promises by the user.<br>

> <br>

> <br>

> On Apr 22, 2008, at 1:38 PM, Richard Treumann wrote:<br>

> <br>

> > I have a proposal for providing information to the MPI  <br>

> > implementation at MPI_INIT time to allow certain optimizations  <br>

> > within the run. This is not a "hints" mechanism because it does  <br>

> > change the semantic rules for MPI in the job run. A correct  <br>

> > "vanilla" MPI application could give different results or fail if  <br>

> > faulty information is provided.<br>

> ><br>

> > I am interested in what the Forum members think about this idea  <br>

> > before I try to formalize it.<br>

> ><br>

> > I will state up front that I am a skeptic about most of the MPI  <br>

> > Subset goals I hear described. However, I think this is a form of  <br>

> > subsetting I would support. I say "I think" because it is possible  <br>

> > we will find serious complexities that would make me back away.. If  <br>

> > this looks as straightforward as I expect, perhaps we could look at  <br>

> > it for MPI 2.2. The most basic valid implementation of this is a  <br>

> > small amount of work for an implementer. (Well within the scope of  <br>

> > MPI 2.2 effort / policy)<br>

> ><br>

> > = <br>

> > = <br>

> > = <br>

> > = <br>

> > = <br>

> > = <br>

> > = <br>

> > = <br>

> > = <br>

> > = <br>

> > = <br>

> > = <br>

> > = <br>

> > = <br>

> > = <br>

> > = <br>

> > = <br>

> > = <br>

> > = <br>

> > = <br>

> > ======================================================================<br>

> ><br>

> > The MPI standard has a number of thorny semantic requirements that a  <br>

> > typical program does not depend on and that an MPI implementation  <br>

> > may pay a performance penalty by guaranteeing. A standards defined  <br>

> > mechanism which allows the application to explicitly let libmpi off  <br>

> > the hook at MPI_Init time on the ones it does not depend on may  <br>

> > allow better performance in some cases. This would be an "assert"  <br>

> > rather than a "hints" mechanism because it would be valid for an MPI  <br>

> > implementation to fail a job that depends on an MPI feature but lets  <br>

> > libmpi off the hook on it at the MPI_Init call In most, but not all,  <br>

> > of these cases the MPI implementation could easily give an error  <br>

> > message if the application did something it had promised not to do.<br>

> ><br>

> > Here is a partial list of sometimes troublesome semantic requirements.<br>

> ><br>

> > 1) MPI_CANCEL on MPI_ISEND probably cannot be correctly supported  <br>

> > without adding a message ID to every message sent. Using space in  <br>

> > the message header adds cost.and may be a complete waste for an  <br>

> > application that never tries to cancel an ISEND. (If there is a cost  <br>

> > for being prepared to cancel an MPI_RECV we could cover that too)<br>

> ><br>

> > 2) MPI_Datatypes that define a contiguous buffer can be optimized if  <br>

> > it is known that there will never be a need to translate the data  <br>

> > between heterogeneous nodes.   An array of structures, where each  <br>

> > structure is a MPI_INT followed by an MPI_FLOAT is likely to be  <br>

> > contiguous. An MPI_SEND of count==100 can bypass the datatype engine  <br>

> > and be treated as a send of 800 bytes if the destination has the  <br>

> > same data representations. An MPI implementation that "knows" it  <br>

> > will not need to deal with data conversion can simplify the datatype  <br>

> > commit and internal representation by discarding the MPI_INT/ <br>

> > MPI_FLOAT data and just recording that the type is 8 bytes with a  <br>

> > stride of 8.<br>

> ><br>

> > 3) The MPI standard either requires or strongly urges that an  <br>

> > MPI_REDUCE/MPI_ALLREDUCE give exactly the same answer every time. It  <br>

> > is not clear to me what that means. If it means a portable MPI like  <br>

> > MPICH or OpenMPI must give the same answer whether run on an Intel  <br>

> > cluster,an IBM Power cluster or a BlueGene then I would bet no MPI  <br>

> > in the world complies. If it means Version 5 of an MPI must give the  <br>

> > same answer Version 1 did, it would prevent new algorithms. However,  <br>

> > if it means that two "equivalent" reductions in a single application  <br>

> > run must agree then perhaps most MPIs comply. Whatever it means,  <br>

> > there are applications that do not need any "same answer" promise as  <br>

> > long at they can assume they will get a "correct" answer. Maybe they  <br>

> > can be provided a faster reduction algorithm.<br>

> ><br>

> > 4) MPI supports persistent send/recv which could allow some  <br>

> > optimizations in which half rendezvous, pinned memory for RDMA,  <br>

> > knowledge that both sides are contiguous buffers etc can be  <br>

> > leveraged. The ability to do this is damaged by the fact that the  <br>

> > standard requires a persistent send to match a normal receive and a  <br>

> > normal send to match a persistent receive. The MPI implementation  <br>

> > cannot make any assumptions that a matching send_init and recv_init  <br>

> > can be bound together.<br>

> ><br>

> > 5) Perhaps MPI pt2pt communication could use a half rendezvous  <br>

> > protocol if it were certain no receive would use MPI_ANY_SOURCE. If  <br>

> > all receives will use an explicit source then libmpi can have the  <br>

> > receive side send a notice to the send side that a receive is  <br>

> > waiting. There is no need for the send side to ship the envelop and  <br>

> > wait for a reply that the match is found. If MPI_ANY_SOURCE is  <br>

> > possible then the send side must always start the transaction. (I am  <br>

> > not aware of an issue with MPI_ANY_TAG but maybe somebody can think  <br>

> > of one)<br>

> ><br>

> > 6) It may be that an MPI implementation that is ready to do a spawn  <br>

> > or join must use a more complex matching/progress engine than it  <br>

> > would need if it knew the set of connections & networks it had at  <br>

> > MPI_Init could never be expanded.<br>

> ><br>

> > 7) The MPI standard allows a standard send to use an eager protocol  <br>

> > but requires that libmpi promise every eager message can be buffered  <br>

> > safely. The MPI implementation must fall back to rendezvous protocol  <br>

> > when the promise can no longer be kept. This semantic can be  <br>

> > expensive to maintain and produces serious scaling problems. Some  <br>

> > applications depend on this semantic but many, especially those  <br>

> > designed for massive scale, work in ways that ensure libmpi does not  <br>

> > need to throttle eager sends. The applications pace themselves.<br>

> ><br>

> > 8) requirement that multi WAIT/TEST functions accept mixed arrays of  <br>

> > MPI_Requests ( the multi WAIT/TEST routines may need special  <br>

> > handling in case someone passes both Isend/Irecv requests and  <br>

> > MPI_File_ixxx requests to the same MPI_Waitany for example) I bet  <br>

> > applications seldom do this but is allowed and must work.<br>

> ><br>

> > 9) Would an application promise not to use MPI-IO allow any MPI to  <br>

> > do an optimization?<br>

> ><br>

> > 10) Would an application promise not to use MPI-1sided allow any MPI  <br>

> > to do an optimization?<br>

> ><br>

> > 11) What others have I not thought of at all?<br>

> ><br>

> > I want to make it clear that none of these MPI_Init time assertions  <br>

> > should require an MPI implementation that provides the assert ready  <br>

> > MPI_Init to work differently. For example, the user assertion that  <br>

> > her application does not depend on a persistent send matching a  <br>

> > normal receive or normal send matching a persistent receive does not  <br>

> > require the MPI implementation to suppress such matches. It remains  <br>

> > the users responsibility to create a program that will still work as  <br>

> > expected on an MPI implementation that does not change its behavior  <br>

> > for any specific assertion.<br>

> ><br>

> > For some of these it would not be possible for libmpi to detect that  <br>

> > the user really is depending on something he told us we could shut  <br>

> > off.<br>

> ><br>

> > The interface might look like this:<br>

> > int MPI_Init_thread_xxx(int *argc, char *((*argv)[]), int required,  <br>

> > int *provided, int assertions)<br>

> ><br>

> > mpi.h would define constants like this:<br>

> ><br>

> > #define MPI_NO_SEND_CANCELS 0x00000001<br>

> > #define MPI_NO_ANY_SOURCE 0x00000002<br>

> > #define MPI_NO_REDUCE_CONSTRAINT 0x00000004<br>

> > #define MPI_NO_DATATYPE_XLATE 0x00000010<br>

> > #define MPI_NO_EAGER_THROTLE 0x00000020<br>

> > etc<br>

> ><br>

> > The set of valid assertion flags would be specified by the standard  <br>

> > as would be their precise meanings. It would always be valid for an  <br>

> > application to pass 0 (zero) as the assertions argument. It would  <br>

> > always be valid for an MPI implementation to ignore any or all  <br>

> > assertions. With a 32 bit integer for assertions, we could define  <br>

> > the interface in MPI 2.2 and add more assertions in MPI 3.0 if we  <br>

> > wanted to. We could consider an 64 bit assert to keep the door open  <br>

> > but I am pretty sure we can get by with 32 distinct assertions.<br>

> ><br>

> ><br>

> > A application call would look like: MPI_Init_thread_xxx( 0, 0,  <br>

> > MPI_THREAD_MULTIPLE, &provided,<br>

> > MPI_NO_SEND_CANCELS | MPI_NO_ANY_SOURCE | MPI_NO_DATATYPE_XLATE);<br>

> ><br>

> > I am sorry I will not be at the next meeting to discuss in person  <br>

> > but you can talk to Robert Blackmore.<br>

> ><br>

> ><br>

> ><br>

> ><br>

> > Dick Treumann<br>

> > Dick Treumann - MPI Team/TCEM<br>

> > IBM Systems & Technology Group<br>

> > Dept 0lva / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601<br>

> > Tele (845) 433-7846 Fax (845) 433-8363<br>

> > _______________________________________________<br>

> > mpi-22 mailing list<br>

> > mpi-22@lists.mpi-forum.org<br>

> > </tt><tt><a href="http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-22">http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-22</a></tt><tt><br>

> <br>

> <br>

> -- <br>

> Jeff Squyres<br>

> Cisco Systems<br>

> <br>

> _______________________________________________<br>

> mpi-forum mailing list<br>

> mpi-forum@lists.mpi-forum.org<br>

> </tt><tt><a href="http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-forum">http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-forum</a></tt><tt><br>

> ---------------------------------------------------------------------<br>

> Intel GmbH<br>

> Dornacher Strasse 1<br>

> 85622 Feldkirchen/Muenchen Germany<br>

> Sitz der Gesellschaft: Feldkirchen bei Muenchen<br>

> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer<br>

> Registergericht: Muenchen HRB 47456 Ust.-IdNr.<br>

> VAT Registration No.: DE129385895<br>

> Citibank Frankfurt (BLZ 502 109 00) 600119052<br>

> <br>

> This e-mail and any attachments may contain confidential material for<br>

> the sole use of the intended recipient(s). Any review or distribution<br>

> by others is strictly prohibited. If you are not the intended<br>

> recipient, please contact the sender and delete all copies.<br>

> <br>

> <br>

> _______________________________________________<br>

> mpi-22 mailing list<br>

> mpi-22@lists.mpi-forum.org<br>

> </tt><tt><a href="http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-22">http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-22</a></tt><tt><br>

</tt></body></html>