[Mpi-forum] MPI One-Sided Communication

Fri Apr 24 21:53:06 CDT 2009

On Fri, Apr 24, 2009 at 9:22 PM, Underwood, Keith D
<keith.d.underwood at intel.com> wrote:
>>No, that's not true in the slightest for what Vinod is referring to.
>>NWChem's CCSD(T) code hit 0.357 petaflops at 0.55% efficiency on
>>Jaguar and this method is anything but simple.  The most scalable
>>portion is a 7-loop accumulation wherein the first loop computes very
>>large intermediates (by design, it fills up the memory) which must be
>>communicated all over the place in a non-trivial way along with other
>>intermediates and permanent data inside of another set of do loops.
>>Once all the buffers get pushed around, the inner loops are just a
>>bunch of DGEMMs.
>
> Ahem.  I think you just proved Greg's point.  Jaguar is a Cray XT series.  While I rather like that machine (I was part of the team at Sandia at the time), it has atrocious one-sided performance for small messages.  If you can do it with big messages, you can do it with MPI-1 two sided.

I repeat: show me the code.  By claiming that I could do it with
MPI-1, you're saying that $100 million worth of one-sided-based
quantum chemistry codes were a waste of time and money and that some
very well regarded people are, in fact, quite stupid.

>>See the attached papers for the details of the algorithm.  The
>>GA/ARMCI implementation is less than 1000 lines of code.  Please let
>>me know when you can match this with MPI-1.
>
> At this point, we are just bickering about where the work is, right?  So, the app wants to program in a GA model for ease of use.  Great.  That isn't a performance argument.

If programmability isn't part of the equation, why have MPI?  Why have
C?  We can all just write in assembly, right?

We need to have an optimal balance between programmability and
performance.  For the HPC applications I work on, one-sided beats
two-sided hands down.  That's not a philosophy, it's 20 years of
results on machines from the IBM SP-2 to the Cray XT5.

>>> I think this nicely illustrates my point that I don't know of any
>>> existing HPC application that really needs one-sided hardware to get
>>> great performance.*
>>
>>Frankly, this is just a religious war already waged in HPCWire
>>(Myricom versus IBM).  It appears you're saying that IBM and Cray
>>supporting powerful one-sided hardware is a waste of money.  Is that
>>why their machines, SGI's and those running Infiniband (excellent
>>RDMA) compose the first 26 slots on the Top500 list?  Anti-one-sided
>>Myricom's best showing is #27.
>
> Um, yeah, Infiniband isn't up there because of its one-sided support.  Neither is the Cray machine.  Oh, wait, neither are the IBM machines.

So good hardware support for one-sided correlates strongly with Top500
performance?  Why then argue against one-sided hardware?

Honestly, I can't follow this moving target argument against
one-sided.  What is bad about one-sided, the hardware support for it,
the programming model itself, or the use of implementation layers for
programming within the model?  It seems you guys want to pick and
choose what aspect of one-sided you want to trash whenever I attempt
to pin down what your trying to say.

I've heard that when MPI-1 was being developed, having an
implementation settled a lot of religious debates immediately.  I'm
asking you now to follow that pattern and show me the two-sided
implementation that beats one-sided for a specific HPC application:
CCSD(T).

Jeff

-- 
Jeff Hammond
The University of Chicago
http://home.uchicago.edu/~jhammond/