[Mpi-forum] Persistence Working Group Proposed

Wed Feb 25 10:54:57 CST 2009

Everyone,

I would like to reintroduce proposals for
a) persistent collectives (including persistent non-blocking)
b) persistent generalized requests
c) ability to modify persistent operations
d) techniques to allow "initialization" vs. "instance" performance tradeoffs
to be expressed portably

All of these things help with performance and scalability, and do little to
implementation  complexity where they are not done.

Persistent operations are very useful with data parallel and pipelined
parallel programs with temporal locality and uniform or slowly changing
patterns of communication.  Static and semi-static patterns are common
and can be further optimized.

Persistence takes "middle level code" out of the critical path, and encourages
concurrent operations, when used in combination with non-blocking point to
point and collective technology.

Persistence allows planned-transfer and deep optimizations to be done "one time"
so that the instances are faster.  This has proven very successful with technologies
such as FFTW, and is common to low-level message passing systems used in some
embedded HPC machines.

Persistence allows for reducing or eliminating some of the costs of memory management
as well as for choosing better collective algorithms .  For example, optimized 
strategies for pinning, unpinning, and streamlining derived datatypes become a snap
with persistent mode.

Adding persistent collective (blocking and non-blocking) and persistent generalized
requests orthogonalizes the design of MPI in a convenient way as well, but it is
for performance and scalability that it is most important.

While we already have send_init() and recv_init() functionality, it is limited to
point-to-point.  Commercial implementations have in past found scope for getting
more performance out of persistent than point to point; early implementations 
chose not to do so.  

As architectures evolve, and memory management becomes more complex and hierarchical,
including DMA architectures, and one wants to run MPI at higher and higher scales,
planned transfer modes for regular/data-parallel programs will expand the performance
potential of MPI programs, while requiring relatively simple transformations of
existing regular parallel programs.

I will finally mention that persistent generalized requests are useful so as to allow
complex patterns to be reused once specified, rather than having to be reconstructed
at each use.  I will argue that persistent generalized requests are more useful than
non-persistent ones in many many way. :-)

I'd like to propose a working group for that.

It fits squarely in MPI 3.0.

Rolf observed that in past, few people use persistent operations in their studies.
However, it may be because it had little performance benefit in then existing implementations.
Moving forward, and when applied to collective operations, huge performance gains are
possible, as they also are when derived datatypes are part of point-to-point.

I hope other institutions will agree so we can pursue this.

Regards,

Tony

-- 
Anthony Skjellum, PhD
Professor and Chair
Dept. of Computer and Information Sciences
University of Alabama at Birmingham
+1-(205)934-8657; FAX: +1- (205)934-5473
___________________________________________
CONFIDENTIALITY:  This e-mail and any attachments are confidential and
may be privileged.  If you are not a named recipient, please notify the
sender immediately and do not disclose the contents to another person,
use it for any purpose or store or copy the information in any medium.

Please consider the environment before printing this e-mail