Jeff Squyres (jsquyres)
jsquyres at cisco.com
Wed Dec 4 18:01:33 CST 2013
On Dec 4, 2013, at 4:23 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
> It would make a _huge_ difference for distributions if MPICH and Open
> MPI could have a compatible ABI.
We, too, would be happy if MPICH adopted all of Open MPI's internal implementation conventions.
...but I don't see that happening any time soon.
I don't say this glibly; there are years of momentum and complicated engineering issues involved. More below.
> However, the question of integer (MPICH) versus pointer (OMPI) handles
> is a long-standing impasse.
To be blunt: this totally misses the point.
Integer vs. pointer is but one of MANY issues that prevent MPI implementations from being ABI compatible. Let's not also forget:
- Values of constants. Simple integer values are usually easy to device/resolve (but don't forget that some integer values are specifically chosen on specific platforms). Sentinel values like MPI_STATUS_IGNORE are not.
- Size/content of MPI_Status. MPI implementations hide (different) non-public fields in MPI_Status.
- Launcher differences. The mpirun/mpiexec (or whatever launcher) is inherently different, with different CLI options and configuration, between different MPI implementations.
- Run-time system differences. Some MPI's support different run-time systems, some don't. This is typically exposed through the launcher, but also has impact on the MPI processes themselves. More specifically: the launcher and the MPI library (and support libraries) usually go in pairs and cannot be separated.
==> Note that this is a giant problem. Many have tried to have create a unified parallel run-time over the past ~15 years. I think it's fair to say that none have succeeded (i.e., none have taken over the world / become the one run-time system that everyone uses).
- Library names. libmpi.so? libompi.so? libmpich.so? libmpi.a? And so on.
- Dependent libraries. What else do you need to link when linking the application? You can (usually) hide this when the MPI is a shared library, but a) not always, and b) that doesn't help when MPI is a static library.
- Compiler ABIs. MPI middleware cannot solve the fact that C++ and Fortran compilers cannot (and will not) agree on an ABI (for many good reasons, BTW).
- Compiler options. Was the MPI library compiled with -O3? -i8? -32 or -64? ...?
The fact of the matter is that the MPI API was *specifically designed with only source compatibility in mind*. We now have nearly 20 years of momentum in different MPI implementations with different engineering design choices.
The term "ABI" is a catchall for many, many different issues. People seem to think that simply switching libraries at run time is a silver bullet -- "if only I could just change my LD_LIBRARY_PATH and use a different MPI implementation, then the world would be better". But let's also not forget that most users barely know how to use LD_LIBRARY_PATH (if at all).
Sure, ABI between different releases of the same (MPI implementation+compiler+environment) is a Good Thing -- users want that. And I think all major MPI implementations deliver that today. It sure makes upgrading the MPI implementation a lot easier on the user.
But ABI between different implementations is quite a different animal, with an entirely different set of challenges and payoffs.
(I'm sure this post will inspire some impassioned replies :-) )
jsquyres at cisco.com
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
More information about the mpi-forum