[Mpi-forum] ABI

Jeff Squyres (jsquyres) jsquyres at cisco.com
Wed Dec 4 18:01:33 CST 2013

On Dec 4, 2013, at 4:23 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:

> It would make a _huge_ difference for distributions if MPICH and Open
> MPI could have a compatible ABI.  

We, too, would be happy if MPICH adopted all of Open MPI's internal implementation conventions.  

...but I don't see that happening any time soon.

I don't say this glibly; there are years of momentum and complicated engineering issues involved.  More below.

> However, the question of integer (MPICH) versus pointer (OMPI) handles
> is a long-standing impasse. 

To be blunt: this totally misses the point.

Integer vs. pointer is but one of MANY issues that prevent MPI implementations from being ABI compatible.  Let's not also forget:

- Values of constants.  Simple integer values are usually easy to device/resolve (but don't forget that some integer values are specifically chosen on specific platforms). Sentinel values like MPI_STATUS_IGNORE are not.
- Size/content of MPI_Status.  MPI implementations hide (different) non-public fields in MPI_Status.
- Launcher differences.  The mpirun/mpiexec (or whatever launcher) is inherently different, with different CLI options and configuration, between different MPI implementations.
- Run-time system differences.  Some MPI's support different run-time systems, some don't.  This is typically exposed through the launcher, but also has impact on the MPI processes themselves.  More specifically: the launcher and the MPI library (and support libraries) usually go in pairs and cannot be separated.
==> Note that this is a giant problem.  Many have tried to have create a unified parallel run-time over the past ~15 years.  I think it's fair to say that none have succeeded (i.e., none have taken over the world / become the one run-time system that everyone uses).
- Library names.  libmpi.so?  libompi.so?  libmpich.so?  libmpi.a?  And so on.
- Dependent libraries.  What else do you need to link when linking the application?  You can (usually) hide this when the MPI is a shared library, but a) not always, and b) that doesn't help when MPI is a static library.
- Compiler ABIs.  MPI middleware cannot solve the fact that C++ and Fortran compilers cannot (and will not) agree on an ABI (for many good reasons, BTW).
- Compiler options.  Was the MPI library compiled with -O3?  -i8?  -32 or -64?  ...?

The fact of the matter is that the MPI API was *specifically designed with only source compatibility in mind*.  We now have nearly 20 years of momentum in different MPI implementations with different engineering design choices.  

The term "ABI" is a catchall for many, many different issues.  People seem to think that simply switching libraries at run time is a silver bullet -- "if only I could just change my LD_LIBRARY_PATH and use a different MPI implementation, then the world would be better".  But let's also not forget that most users barely know how to use LD_LIBRARY_PATH (if at all).

Sure, ABI between different releases of the same (MPI implementation+compiler+environment) is a Good Thing -- users want that.  And I think all major MPI implementations deliver that today.  It sure makes upgrading the MPI implementation a lot easier on the user.

But ABI between different implementations is quite a different animal, with an entirely different set of challenges and payoffs.

(I'm sure this post will inspire some impassioned replies :-) )

Jeff Squyres
jsquyres at cisco.com
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/

More information about the mpi-forum mailing list