[Mpi3-subsetting] Some "stupid user" questions, comments.

Richard Graham rlgraham at [hidden]
Fri Feb 29 09:19:56 CST 2008


On 2/29/08 9:26 AM, "Supalov, Alexander" <alexander.supalov_at_[hidden]>
wrote:

> Dear RIchard,
>  
> Thanks. The more complicated the standard gets, the happier are the
> implementors. However, now we try to think like MPI users for a change, so,
> thanks for providing a reality check.
> 
>>> >> Quite to the contrary.  The simpler the standard is the easier to support
>>> ­ complexity is not a good thing at all.
>>> >> This is my view as an implementer.  Complexity is often introduced when
>>> trying to get good performance out of
>>> >> a spec that supports a wide variety of options.
>  
> Now, to one of your questions. An MPI_ANY_SOURCE MPI_Recv in multifabric
> environment means that a receive has to be posted somehow to more than one
> fabric in the MPI device layer. Once one of them gets the message, the posted
> receives should be cancelled on other fabrics. Now, what if they've already
> matched and started to receive something? What if they cannot cancel a posted
> receive? And so on. There are 3 to 5 ways to deal with this situation, with
> and without actually posting a receive, but none of them is good enough if you
> ask me. That's why there are 3 to 5 of them, actually. And all of them
> complicate the progress engine - the heart of an MPI implementation - at
> exactly the spot where one wants things simple and fast.
> 
>>> >> The any_source and multiple fabrics are two distinct issues.  Even if you
>>> do not support any_source and have
>>> >> multiple fabrics, you have the issue that to support mpi ordering
>>> semantics, matching needs to be done
>>> >> in the context of all the nics ­ unless you decide to have only one nic
>>> do the matching, including any on-host
>>> >> traffic.  What any_source forces is matching on the receive side ­ unless
>>> one wants to set up a very complex
>>> >> and inefficient way to make sure that only one receive is matched for
>>> each wild card receive.
> 
> Rich
>  
> This means that most of the time we fight these repercussions and curse the
> MPI_ANY_SOURCE. Or, looping back to the beginning of this message, we actually
> never stop blessing MPI_ANY_SOURCE. Fighting this kind of trouble is what we
> are paid for. ;)
>  
> Best regards.
>  
> Alexander
> 
> 
> From: mpi3-subsetting-bounces_at_[hidden]
> [mailto:mpi3-subsetting-bounces_at_[hidden]] On Behalf Of Richard
> Barrett
> Sent: Friday, February 29, 2008 2:50 PM
> To: mpi3-subsetting_at_[hidden]
> Subject: [Mpi3-subsetting] Some "stupid user" questions, comments.
> 
> Hi folks,
> 
> I'm still sorting things out in my mind, so perhaps this note is just me
> talking to myself. But should you feel so compelled to sort through it, I
> would appreciate any feedback you might offer; and it will make me a more
> informed participant.
> 
> I see two main perspectives: the user and the implementer. I come from the
> user side, so I feel comfortable in positing that user confusion over the size
> of the standard is really a function of presentation. That is, most of us get
> our information regarding using MPI directly from the standard. For me, this
> is the _only_ standard I've ever actually read! Perhaps I am missing out on
> thousands of C and Fortran capabilities, but sometimes ignorance is bliss.
> That speaks highly to the MPI specification presentation; however it need not
> be the case. An easy solution to the "too many routines" complaint is a
> tutorial/book/chapter on the basics, with pointers to further information. And
> in fact these books exist. That said, I hope that MPI-3 deprecates a
> meaningful volume of functionality.
> 
>> >From the implementer perspective, there appear to be two goals. First is to
>> ease the burden with regard to the amount of functionality that must be
>> supported. (And we users don't want to hear of your whining, esp. from a
>> company the size of Intel :) Second, which overlaps with user concerns, is
>> performance. That is, by defining a small subset of functionality, strong
>> performance (in some sense, e.g. speed or memory requirements) can be
>> realized.
> 
> At the risk of starting too detailed a discussion at this early point (as well
> as exposing my ignorance:), I will throw out a few situations for discussion.
> 
> 1. What  would such a subset would imply with regard to what I view as support
> functionality, such as user-defined datatypes, topologies, etc? Ie could this
> support be easily provided, say by cutting-and-pasting from the full
> implementation you will still provide? (I now see Torsten recommends
> excluding datatypes, but what of other stuff?)
> 2. Even  more broadly (and perhaps very ignorantly), can I simply link in both
> libraries, like -lmpi_subset -lmpi, getting the good stuff from the former and
> the excluded functionality from the latter? In addition to the application
> developers use of MPI, all large application programs I¹ve dealt with make
> some use of externally produced libraries (a ³very good thing² imo), which
> probably exceed the functionality in a ³subset² implementation.
> 3. I  (basically) understand the adverse performance effects of allowing
> promiscuous  receives (MPI_ANY_SOURCE). However, this is a powerful capability
> for many  codes, and used only in moderation, eg for setting up communication
> requirements (such as communication partners in unstructured, semi-structured,
> and dynamic mesh computations). In this case the sender knows its partner, but
> the receiver does not. A reduction(sum) is used to let each process know the
> number of communication partners from which it will receive data, the process
> posts that many promiscuous receives, which when satisfied lets it from then
> on specify the sender. So would it be possible to include this capability in a
> separate function, say the blocking send/recv, but not allow it in the
> non-blocking version?
> 4. Collectives: I can't name a code I've ever  worked with that doesn't
> require MPI_Allreduce (though I wouldn¹t be surprised  to hear of many), and
> this in a broad set of science areas. MPI_Bcast is also  often used (but quite
> often only in the setup phase). I see MPI_Reduce used  most often to collect
> timing information, so MPI_Allreduce would probably be  fine as well.
> MPI_Gather is often quite useful, as is MPI_Scatter, but again  often in
> setup. (Though often ³setup² occurs once per time step.) Non-constant  size
> versions are often used. And others can also no doubt offer strong  opinions
> regarding inclusion of exclusion. But from an implementation  perspective,
> what are the issues? In particular, is the basic infrastructure  for these
> (and other collective operations) the same? A driving premise for  supporting
> collectives is that the sort of performance driven capability under
> discussion is most needed by applications running at very large scale, which
> is where even very good collect implementations run into problems.
> 5. Language bindings and perhaps other things:  With the expectation/hope that
> full implementations continue to be available,  I could use them for code
> development, thus making use of things like type  checking, etc. And does this
> latter use then imply the need for "stubs" for  things like the (vaporous)
> Fortran bindings module, communicators (if only  MPI_COMM_WORLD is supported),
> etc.? And presuming the answer to #2 is ³no²,  could/should the full
> implementation ³warn² me (preferably at compile time)  when I¹m using
> functionality that rules out use of the subset?
> 6. Will  the profile layer still be supported? Generating usage can still be
> quantified  using a full implementation, but performance would not be (at
> least in this  manner), which would rule out an apples-to-apples comparison
> between a full  implementation and the subset version with its advertised
> superior  performance. (Of course an overall runtime could be compared, which
> is the  final word, but a more detailed analysis is often preferred.)
> 7. If  blocking and non-blocking are required of the subset, aren't these
> blocking  semantics?
> 
>     MPI_Send: MPI_Isend ( ..., &req ); MPI_Wait ( ..., &req );
>     -----
>     MPI_Recv: MPI_Irecv ( ..., &req ); MPI_Wait ( &req );
> 
>         - And speaking of this, are there performance issues associated with
> variants of MPI_Wait, eg MPI_Waitany, MPI_Waitsome?
> 
> Finally, I¹ll officially register my concern with what I see as an increasing
> complexity in this effort, esp wrt ³multiple subsets². I don¹t intend this
> comment to suppress ideas, but to help keep the beating the drum for
> simplicity, which I see as a key goal of this effort.
> 
> If you read this far, thanks! My apologies if some of these issues have been
> previously covered. And if I've simply exposed myself as ignorant, I feel
> confident is stating that I am not alone - these questions will persist from
> others. :)
> 
> Richard





* 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpi3-subsetting/attachments/20080229/2d53bba5/attachment.html>


More information about the Mpi3-subsetting mailing list