From schuchart at icl.utk.edu Fri Nov 12 10:26:26 2021 From: schuchart at icl.utk.edu (Joseph Schuchart) Date: Fri, 12 Nov 2021 11:26:26 -0500 Subject: [mpiwg-languages] Follow-up: discussion on ref-counted MPI objects Message-ID: <88eb3434-2211-5ce9-d552-95eabd1b0287@icl.utk.edu> Lisandro, All, (Lisandro: Apologies if you receive multiple copies, I wasn't sure whether you're on the list) I have some thoughts as a follow-up to the discussion yesterday on exposing reference-counts of MPI objects to the user. First, let me try to summarize the use-case: for a developer in a language that supports ref-counting, it is hard to write ref-counted wrapper objects around MPI handles *and* be able to pass the handle to third-party libraries written in languages that do not support whatever ref-counting mechanism is used (e.g., a wrapper around MPI_Comm in python, passing the handle to a C library's set_comm() function). The idea is that exposing a reference counting mechanism in MPI to the user would help here because that is the lowest common denominator shared by all libraries. Is that correct? (I missed the first couple of minutes, just want to make sure I get it right) The problem now is that passing the handle out of the ref-counted wrapper to a library potentially creates dangling references because that library might store the handle and eventually free it without the wrapper knowing. Bummer. Here is where I am not sure how ref-counting at the MPI library level would help: how would you know whether the library's set_comm() call actually increments the ref-count? Just as you cannot be sure that it won't destroy the MPI object and leave your handle dangling, you have no guarantee that the ref-counting would be correct because there is no way to enforce that across all supported languages. As Dan had pointed out, duplicating MPI objects is the right way to go here. It is good practice for libraries to treat handles to MPI objects as borrowed references and only store handles to duplicates. Unless explicitly documented that ownership is transferred, the library should not destroy the MPI object it received. Yes, there is no way to enforce this in C, so it has to be part of the verbal API contract. But the same would be true for correct reference counting. I don't see a way around relying on soft contracts here... And as was discussed during the call: MPI libraries may ref-count internal parts of communicators and datatypes so their duplication should be fairly lightweight. I realize that there are no duplication functions for files and windows (I had proposed window duplication at EuroMPI this year; I'm not what sure the semantics for files would be though). Would having window and file duplication help here? Or did I get any of the discussion wrong? Thanks Joseph From dalcinl at gmail.com Fri Nov 12 11:06:40 2021 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Fri, 12 Nov 2021 20:06:40 +0300 Subject: [mpiwg-languages] Follow-up: discussion on ref-counted MPI objects In-Reply-To: <88eb3434-2211-5ce9-d552-95eabd1b0287@icl.utk.edu> References: <88eb3434-2211-5ce9-d552-95eabd1b0287@icl.utk.edu> Message-ID: On Fri, 12 Nov 2021 at 19:26, Joseph Schuchart wrote: > Lisandro, All, > > (Lisandro: Apologies if you receive multiple copies, I wasn't sure > whether you're on the list) > > I have some thoughts as a follow-up to the discussion yesterday on > exposing reference-counts of MPI objects to the user. First, let me try > to summarize the use-case: for a developer in a language that supports > ref-counting, it is hard to write ref-counted wrapper objects around MPI > handles *and* be able to pass the handle to third-party libraries > written in languages that do not support whatever ref-counting mechanism > is used (e.g., a wrapper around MPI_Comm in python, passing the handle > to a C library's set_comm() function). Yes, > The idea is that exposing a > reference counting mechanism in MPI to the user would help here because > that is the lowest common denominator shared by all libraries. Is that > correct? (I missed the first couple of minutes, just want to make sure I > get it right) > > but I would add that adding facilities for better handling the lifetime of handles (at least for some types) is not only useful for ref-counted languages like Python, but even for C/C++/Fortran libraries. > The problem now is that passing the handle out of the ref-counted > wrapper to a library potentially creates dangling references because > that library might store the handle and eventually free it without the > wrapper knowing. Bummer. > Or not free it, and then the library caller is responsible for keeping track of the handle and destroying it at a later time. MPI provides no clear recommendation, much less proper APIs, for sane handling of MPI object lifetimes. > > Here is where I am not sure how ref-counting at the MPI library level > would help: how would you know whether the library's set_comm() call > actually increments the ref-count? Right now, you don't. You don't know if the library will ever free the handle, ever. The "contract" has to be written in the library documentation, and there are no general guidelines in the MPI standard. So library developers do whatever they like or think it is the right thing to do (if they ever think about the issue)... and we scientists are terrific software engineers ;-) > Just as you cannot be sure that it > won't destroy the MPI object and leave your handle dangling, you have no > guarantee that the ref-counting would be correct because there is no way > to enforce that across all supported languages. > Indeed. Just adding the APIs is not enough. The standard should add new recommendations. > > As Dan had pointed out, duplicating MPI objects is the right way to go > here. Ask PETSc folks about the messy code they have in place to prevent excessive communicator duplication. They basically had to add their own refcounting for MPI_Comm using attributes, inner comm dupes, etc. So, no, duplication is not always the answer. > It is good practice for libraries to treat handles to MPI objects > as borrowed references and only store handles to duplicates. Alternatively, with my proposal/request, libraries could take ownership of a reference. > Unless > explicitly documented that ownership is transferred, the library should > not destroy the MPI object it received. And then the burden on managing handle lifetime is put on the library caller. > Yes, there is no way to enforce > this in C, so it has to be part of the verbal API contract. But the same > would be true for correct reference counting. I don't see a way around > relying on soft contracts here... > All I'm asking for is the addition of a few APIs [for example, MPI_Type_clone(datatype, &newref) or perhaps MPI_Type_incref(datatype), and so on for other types] that would help to better define SOFT contracts. This is all about soft contracts, there is no way to enforce anything in C. > And as was discussed during the call: MPI libraries may ref-count > internal parts of communicators and datatypes so their duplication > should be fairly lightweight. Except that duplication is not the same as aliasing, and users/libraries may have valid reasons to want to use aliasing. For the case of mpi4py, it is all about being a wrapper as thin as possible. > I realize that there are no duplication > functions for files and windows (I had proposed window duplication at > EuroMPI this year; I'm not what sure the semantics for files would be > though). Would having window and file duplication help here? > How would window duplication work? The Win handle refers to the same memory buffer, then you can attach different attributes on each duplicate? I'm not sure how that behaviour would be useful, although I'm not against it. > > Or did I get any of the discussion wrong? > I think you got it just right. However, I still disagree that duplication is the ultimate solution. I believe this whole subject is already controversial enough. Perhaps we can start with other related things that would be less controversial. For example, relaxing MPI_XXX_Free rules such that the following snippets of code do not error as they currently do: datatype = MPI_INT; MPI_Type_free(&datatype); // freeing built-in datatype = MPI_DATYPE_NULL; MPI_Type_free(&datatype); // freeing NULL and similarly for MPI_Op, MPI_Group, MPI_Info. I'm not sure about Comm, Win, File, but at least freeing the NULL handles should not error. This is just a relaxation of current rules, so it is backward compatible. -- Lisandro Dalcin ============ Senior Research Scientist Extreme Computing Research Center (ECRC) King Abdullah University of Science and Technology (KAUST) http://ecrc.kaust.edu.sa/ -------------- next part -------------- An HTML attachment was scrubbed... URL: