[mpi-21] C++ predefined MPI handles, const, IN/INOUT/OUT, etc.

Tue Jan 22 19:07:05 CST 2008

The 3 proposals that I sent about C++ issues are both intertwined and  
represent a very complex set of issues.

Shorter version
===============

Does anyone know/remember why the "special case" for the definition of  
OUT parameters exists in MPI-1:2.2?

I ask because the C++ bindings were modeled off the IN/OUT/INOUT  
designations of the language neutral bindings.  MPI_COMM_SET_NAME (and  
others) use the "special case" definition of the [IN]OUT designation  
for the MPI communicator handle parameter.  Two facts indicate that we  
should either override this INOUT designation for the C++ binding (and  
therefore make the method const) and/or revisit the "special case"  
language in MPI-1:2.2:

1. The C binding does not allow the implementation to change the  
handle value
2. The following is a valid MPI code:

     MPI::Intracomm cxx_comm = MPI::COMM_WORLD;
     cxx_comm.Set_name("foo");
     MPI::COMM_WORLD.Get_name(name, len);
     cout << name << endl;

    The output will be "foo" even though we set the name on cxx_comm  
and retrieved it from MPI::COMM_WORLD ***because the state changed on  
the underlying MPI object, not the upper-level handles*** (the same is  
true for error handlers).

Hence, the Set_name() method should be const because the MPI handle  
will not (and cannot) change.  Similar arguments apply to keeping the  
MPI predefined C++ handles as "const" (MPI::INT, etc.) -- their values  
must never change during execution.  It then follows that unless there  
is a good reason for the "special case" language in MPI-1:2.2, it  
should be removed.

Longer version / more details
=============================

At the heart of the issue seems to be text from MPI-1:2.2 about the  
definition of IN, OUT, and INOUT parameters to MPI functions.  This  
text was used to guide many of the decisions about the C++ bindings,  
such as the const-ness (or not) of C++ methods and MPI predefined C++  
handles.  The text states:

-----
  * the call uses but does not update an argument marked IN
  * the call may update an argument marked OUT
  * the call both uses and updates an argument marked INOUT

There is one special case -- if an argument is a handle to an opaque  
object (these terms are defined in Section 2.4.1) and the object is  
updated by the procedure call, then the argument is marked OUT.  It is  
marked this way even though the handle itself is not modified -- we  
use the OUT attribute to denote that what the handle _references_ is  
updated.
-----

The special case for the OUT definition is important because the C++  
bindings were created to mimic the IN, OUT, and INOUT behavior in a  
language that is stricter than C and Fortran: C++ will fail to compile  
if an application violates the defined semantics (which is a good  
thing).

*** The big question: does anyone know/remember why this special case
*** for the "OUT" definition exists?

The special case seems to imply that *explicit* changes to MPI objects  
should be marked as an [IN]OUT parameter (e.g., SET_NAME and  
SET_ERRHANDLER).  Apparently, *implicit* changes to the underlying MPI  
object (such as MPI_ISEND) do not count / should be IN (i.e., many MPI  
implementation *do* change the state either on the communicator or  
something related to the communicator when a send or receive is  
initiated, even though the communicator is an IN argument).

But remember that MPI clearly states that the handle is separate from  
the underlying MPI object.  So why does the binding care if the back- 
end object is updated?  (regardless of whether the change to the  
object is explicit or implicit)

For example, the language-neutral binding for MPI_COMM_SET_NAME has  
the communicator as an INOUT argument.  This clearly falls within the  
"special case" definition because the function semantics explicitly  
change state on the underlying MPI object.

But note that the C binding is "int MPI_Comm_set_name(MPI_Comm  
comm, ...)". Notice that the comm is passed by value, not by  
reference.  So even though the language neutral binding called that  
parameter INOUT, it's not possible for the MPI implementation to  
change the value of the handle.

My claim is that if we want to ensure that the C++ bindings match the  
C bindings (i.e., that the implementation cannot change the value of  
the MPI handle), then the method should be const (i.e.,  
cxx_comm.Set_name(...)) *because the handle value will not, and  
***cannot***, change*.

Simply put: regardless of language or implementation, MPI handles must  
have true handle semantics.  For example:

     MPI::Intracomm cxx_comm = MPI::COMM_WORLD;
     cxx_comm.Set_name("C++ r00l3z!");

     MPI::COMM_WORLD.Get_name(name, len);
     cout << name << endl;

The above will output "C++ r00l3z!" because cxx_comm and  
MPI::COMM_WORLD are handles referring to the same underlying  
communicator.  Hence, the only state that the handles have is whatever  
refers to their back-end MPI object.   Having Set_name() be const  
keeps the *handle* const, not the underlying MPI object.

Tying this all together:

1. cxx_comm.Set_name() *cannot* change state on the cxx_comm handle  
because cxx_comm.Get_name() and MPI::COMM_WORLD.Get_name() must return  
the same results (the same is true for error handlers).  Hence,  
regardless of the implementation of the C++ bindings, the handle value  
cannot change.  Therefore, this method (and all the others like it)  
should be const.

2. As a related issue, if no one can remember why the "special case"  
exists for OUT, then I think we should remove this text and then  
change all those INOUT parameters for the functions I cited in my  
earlier proposal to IN.  This would make the C++ bindings consistent  
with the IN/OUT/INOUT specifications of the language-neutral bindings.

3. All the MPI C++ predefined handles should be const for many of the  
same reasons.  Regardless of what happens to the underlying MPI  
object, the value of the handle cannot ever change.  This is  
guaranteed by MPI-2:2.5.4 pages 10 lines 38-41:

"All named constants, with the exceptions noted below for Fortran, can  
be used in initialization expressions or assignments.  These constants  
do not change values during execution.  Opaque objects accessed by  
constant handles are defined and do not change value between MPI  
initialization MPI_INIT and MPI completion MPI_FINALIZE."

Hence, they should all be "const".

-----

In short: C++ gives us stronger protections to ensure that  
applications don't shoot themselves in the foot.  If the MPI  
predefined handles are const, then statements like "MPI::INT =  
my_dtype;" will fail to compile.  This is a Good Thing.

The original C++ bindings tried to take advantage of const, but missed  
a few points.  Ballot two and one of the items in ballot 3 incorrectly  
tried to fix these points by removing const in several places.  That  
"fixes" the problem, but removes many of the good qualities that we  
can get in C++ with "const".  So let's fix the real problem and leave  
"const" in the C++ bindings.

Are you confused yet?  :-)

-- 
Jeff Squyres
Cisco Systems