[Mpi-forum] MPI-3 One-Sided Communications

Fri Apr 24 08:20:57 CDT 2009

Tony,

On Thu, Apr 23, 2009 at 8:17 PM, Anthony Skjellum <tony at verarisoft.com> wrote:
> Hi, I am not familiar with you project.  Tell us more.

Vinod should be the one to tell you about ARMCI in detail since he
wrote a good portion of it.  I just use it.  There are also a number
of papers available (see below).

http://www.emsl.pnl.gov/docs/parsoft/armci/armci/armci.pdf
http://www.springerlink.com/content/p581340602373484/
http://hpc.sagepub.com/cgi/content/abstract/20/2/233

> In order to achieve lowest latency (or overhead, depending on your optimization, or a blend thereof), a protocol designer might not want to pay for fixed costs that once amortized over long transfers yields more long transfer performance.  Classic fixed vs variable cost situation.  The classic trades in two-sided are zero copy for long and two copy for short.  If you have evidence to the contrary, great... Lots of people reported in the past the effect of a tradeoff as I described coming from the complex semantics of the mpi one sided api.

I think the trade-off comes from the implementation not the one-sided
model itself.  I agree achieving low-latency with a clunky
implementation requires a compromise elsewhere.

> One also has to ask if we can get even lower latency with less complex protocol too imho.

ARMCI is quite minimal and sits very close the hardware layer.  GASNet
is also very lightweight, perhaps as much as can be achieved without
compromising portability.

Einstein's "Make everything as simple as possible, but not simpler" is
the right guiding principle for HPC, in contrast to the standard model
of "make everything as general as possible."

> Do you get both lower latency than two sided short msgs and higher bandwidth than two sided?

Someone else should have data on this.  If it's out-dated, I'll do
fresh timings on current hardware this summer.

> What is the baseline of performance for two sided? Is it optimized? :-)

The ARMCI performance page notes that the vendor MPI implementation
was used in many of the cases and it can be assumed in others.

Jeff

-- 
Jeff Hammond
The University of Chicago
http://home.uchicago.edu/~jhammond/