[Mpi-forum] Per communicator parameterisation

Wed Dec 21 11:45:00 CST 2011

Dear all,

Firstly, apologies if this is the wrong place to post - I couldn't see
a particularly relevant subgroup or forum for discussion, so if this
is the wrong place, redirection is more than welcome....

I have a program. Actually my program is three programs coupled
together. The entire application lives in MPI_COMM_WORLD, and each of
the components operates mainly within its own private communicator.
Another three communicators exist that handle the coupling between
each pair of subcomponents.

I run my program on a machine - it has some special features that
allow me to alter how MPI behaves - I can take advantage of different
protocols for point to point messaging, I can elect to use hardware
acceleration for collectives, and various other things that radically
change how MPI behaves and performs. I can switch these on and off,
say, via the environment and via my MPI launcher program.

For the purposes of tuning, I can isolate the three applications as
well as the inter component communication patterns - I discover that
the parametrisation needed for each to be optimal is rather different.
Tuning for the whole ensemble leads to none of the individual parts
performing optimally. After discussion with my MPI vendor, it rapidly
became apparent that there was no way of articulating from outside the
application that I wanted different (as yet uncreated) communicators
on different task sets to behave differently in anything remotely
approaching a sensible fashion - the communicators are completely
opaque, and there is no way to reference them!

Here's a simple example to illustrate

< ---------- COMM_WORLD --------------->
< main app >        < post processing >
      --- long messages one way-->

My machine has, say,  two p2p messaging modes, low latency driven by
the CPU, and high latency offloaded to supporting hardware. The main
app is tightly synchronised and benefits from the low latency mode
even though it consumes CPU cycles. Communications between the main
app and the post processor is unidirectional and slow delivery does
not hurt overall performance (but involving the CPU in dispatching
them does). Optimal performance is acheived by setting comm_world to
use the high latency offload mode, and comm_main_app to use the low
latency CPU mode.

An even simpler example would be a monolithic application, where
optimal settings are not apparent until part way through execution -
i.e. data dependent.

As it stands there is no practical way of setting this up - even with
the generous support of my vendor. This is in stark contrast to MPI-IO
where I have data structures available to pass either vendor specific
or generic hints allowing me to modify behaviour on a per-file basis,
in application, at runtime. I really want to be able to perform a
similar thing when I create communicators, such that when I perform,
say, a comm_split, I can create unique characteristics for each of the
new communicators.

So I guess the first question is has any work been taking place in
this area, and if not, how do I go about spurring the discussion?

Best regards, Martyn