[mpiwg-tools] PMPI and sessions init

Mon Nov 9 05:26:36 CST 2020

Hi Joachim,

>>> if I understood the discussion about MPI_Session_init and PMPI right, I
>>> don't think there exists an issue.
>>> By ld-preloading the PMPI tool, the tool can also intercept a call to
>>> MPI_Session_init, which might come from a library constructor before
>>> main. Even a function called from the library constructor of a static
>>> library can be intercepted by a ld-preloaded tool.
>>>
>>> For libraries, which are explicitly loaded by the MPI runtime during
>>> initialization, this would also work. For others, it depends on the
>>> ordering of calling the library constructor (might work by chance :).
>>
>> Wasn't the problem the fact not necessarily that the tools can not
>> intercept MPI_Session_init in time, but our current decision to
>> disallow tool *registration* after MPI was initialized, a library
>> using session init inside a constructor would cut off registration for
>> any tool not registered at the time.
>>
>> Maybe I am missing something here or misunderstood something in the
>> discussion?
>>

> If a tool calls session init in the constructor, the MPI library would
> load all tools by an implementation defined mechanism (env variable),
> allowing them to register before finishing initialization.

Again, I don't seem to be able to follow how all of this will play
together.

1) At least my current understanding was, that tools could use a
constructor to *register* and the the MPI library will at
initialization time *initialize* all registered tools using their
callback.

2) As far as I understood, we were also suggesting to disallow further
registration of tools, once MPI is initialized.

3) Constructors are called in arbitrary (or
compiler/system/etc.-defined) order, right?

Now, if a library (not a tool actually employing our approach) would
call MPI_Session_init inside a constructor, it will initialize MPI,
and with that kick the registration restriction in place.

> A challenge for the MPI library might be that it must be prepared for
> possible re-entrance of session init by different tools:
> 
> Tool A - constructor
> -> MPI session init
>   -> load tool B
>      Tool B - constructor
>      -> MPI register tool
>      -> MPI session init (re-entering in nesting)
>         -> load tool C
> ...

Again, *registering*, a tool let's MPI know who it is and how to
initialize it. If a tool did not register, I don't see how MPI would
know how to initialize it.

Not more than one tool can intercept MPI_Session_init using PMPI, so
that is not a portable approach for "any tool to use" as that would
just create conflict with only one tool.

Thinking about this more as I type, I agree that in the current
situation (where we don't have any registration on a pure PMPI-based
tool setup and mostly a single [meta] tool), we should be safe.

However, in the purely QMPI approach, to my mind, what could still
happen is that MPI get's initialized before all constructors are called.

The only way to work through that is to actually allow tool
registration *after* initialization.

We once discussed the possibility of a callback for every tool to
register for "stack reconfiguration", such that every
registered/initialized tool can adapt any cached information to the
new environment.

@Wesley: Why exactly did we come up with the restriction of "no
registration after init"? And did we push off the callback solution to
a later time?

>>> This was one of my main motivation of having such an environmental
>>> variable.
>>
>> I think the environment variable is separate (or orthogonal) to the
>> problem above. It's not so much that MPI won't eventually know all the
>> tools and their order, but that a "rogue" library would just use (and
>> with that initialize) MPI at a time out of control of the tool
>> registration process.
>>
>> At least that was my understanding of the discussion.
> 
> The env variable works like a barrier: MPI initialitation cannot
> complete until all tools are loaded and registered.

How will this work in a single-threaded environment? A libary calls
MPI_* in it's constructor, but has to interrupt this constructor to
call all other constructors, such that all registrations are in, then
continue with the constructor calling into MPI and actually
initializing it?

Trying to take decipherable notes, sometimes I miss some small details
of the discussion as that seems to develop quickly in the last calls,
so am I missing some important yet trivial point here?

Cheers,
Marc-Andre

-- 
Dr. rer. nat. Marc-André Hermanns

IT Center
Group: High Performance Computing
Division: Computational Science and Engineering
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Phone: +49 241 80-24381
hermanns at itc.rwth-aachen.de
www.itc.rwth-aachen.de

Social Media Kanäle des IT Centers:
https://blog.rwth-aachen.de/itc/
https://www.facebook.com/itcenterrwth
https://www.linkedin.com/company/itcenterrwth
https://twitter.com/ITCenterRWTH
https://www.youtube.com/channel/UCKKDJJukeRwO0LP-ac8x8rQ

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5336 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-tools/attachments/20201109/6995a069/attachment.p7s>