[mpiwg-languages] Follow-up: discussion on ref-counted MPI objects

Fri Nov 12 11:06:40 CST 2021

On Fri, 12 Nov 2021 at 19:26, Joseph Schuchart <schuchart at icl.utk.edu>
wrote:

> Lisandro, All,
>
> (Lisandro: Apologies if you receive multiple copies, I wasn't sure
> whether you're on the list)
>
> I have some thoughts as a follow-up to the discussion yesterday on
> exposing reference-counts of MPI objects to the user. First, let me try
> to summarize the use-case: for a developer in a language that supports
> ref-counting, it is hard to write ref-counted wrapper objects around MPI
> handles *and* be able to pass the handle to third-party libraries
> written in languages that do not support whatever ref-counting mechanism
> is used (e.g., a wrapper around MPI_Comm in python, passing the handle
> to a C library's set_comm() function).

Yes,

> The idea is that exposing a
> reference counting mechanism in MPI to the user would help here because
> that is the lowest common denominator shared by all libraries. Is that
> correct? (I missed the first couple of minutes, just want to make sure I
> get it right)
>
>
but I would add that adding facilities for better handling the lifetime of
handles (at least for some types) is not only useful for ref-counted
languages like Python, but even for C/C++/Fortran libraries.

> The problem now is that passing the handle out of the ref-counted
> wrapper to a library potentially creates dangling references because
> that library might store the handle and eventually free it without the
> wrapper knowing. Bummer.
>

Or not free it, and then the library caller is responsible for keeping
track of the handle and destroying it at a later time.
MPI provides no clear recommendation, much less proper APIs, for sane
handling of MPI object lifetimes.

>
> Here is where I am not sure how ref-counting at the MPI library level
> would help: how would you know whether the library's set_comm() call
> actually increments the ref-count?

Right now, you don't. You don't know if the library will ever free the
handle, ever. The "contract" has to be written in the library
documentation, and there are no general guidelines in the MPI standard. So
library developers do whatever they like or think it is the right thing to
do (if they ever think about the issue)... and we scientists are terrific
software engineers ;-)

> Just as you cannot be sure that it
> won't destroy the MPI object and leave your handle dangling, you have no
> guarantee that the ref-counting would be correct because there is no way
> to enforce that across all supported languages.
>

Indeed. Just adding the APIs is not enough. The standard should add new
recommendations.

>
> As Dan had pointed out, duplicating MPI objects is the right way to go
> here.

Ask PETSc folks about the messy code they have in place to prevent
excessive communicator duplication. They basically had to add their own
refcounting for MPI_Comm using attributes, inner comm dupes, etc. So, no,
duplication is not always the answer.

> It is good practice for libraries to treat handles to MPI objects
> as borrowed references and only store handles to duplicates.

Alternatively, with my proposal/request, libraries could take ownership of
a reference.

> Unless
> explicitly documented that ownership is transferred, the library should
> not destroy the MPI object it received.

And then the burden on managing handle lifetime is put on the library
caller.

> Yes, there is no way to enforce
> this in C, so it has to be part of the verbal API contract. But the same
> would be true for correct reference counting. I don't see a way around
> relying on soft contracts here...
>

All I'm asking for is the addition of a few APIs [for example,
MPI_Type_clone(datatype, &newref)  or perhaps MPI_Type_incref(datatype),
and so on for other types] that would help to better define SOFT contracts.

This is all about soft contracts, there is no way to enforce anything in C.

> And as was discussed during the call: MPI libraries may ref-count
> internal parts of communicators and datatypes so their duplication
> should be fairly lightweight.

Except that duplication is not the same as aliasing, and users/libraries
may have valid reasons to want to use aliasing.
For the case of mpi4py, it is all about being a wrapper as thin as
possible.

> I realize that there are no duplication
> functions for files and windows (I had proposed window duplication at
> EuroMPI this year; I'm not what sure the semantics for files would be
> though). Would having window and file duplication help here?
>

How would window duplication work? The Win handle refers to the same memory
buffer, then you can attach different attributes on each duplicate? I'm not
sure how that behaviour would be useful, although I'm not against it.

>
> Or did I get any of the discussion wrong?
>

I think you got it just right. However, I still disagree that duplication
is the ultimate solution.

I believe this whole subject is already controversial enough. Perhaps we
can start with other related things that would be less controversial.
For example, relaxing MPI_XXX_Free rules such that the following snippets
of code do not error as they currently do:

datatype = MPI_INT;
MPI_Type_free(&datatype); // freeing built-in
datatype = MPI_DATYPE_NULL;
MPI_Type_free(&datatype); // freeing NULL

and similarly for MPI_Op, MPI_Group, MPI_Info.
I'm not sure about Comm, Win, File, but at least freeing the NULL handles
should not error.
This is just a relaxation of current rules, so it is backward compatible.

-- 
Lisandro Dalcin
============
Senior Research Scientist
Extreme Computing Research Center (ECRC)
King Abdullah University of Science and Technology (KAUST)
http://ecrc.kaust.edu.sa/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-languages/attachments/20211112/b7336fd3/attachment-0001.html>