[Mpi3-tools] Discussion Topics for the MPIT Interface

Sun Jul 25 16:34:02 CDT 2010

Hi all,

As listed in the previous email, tomorrow's discussion will be around  
the MPIT
interface. I am in the process of putting together a final proposal  
(based on the
last version we had a while ago and the various additions and changes we
discussed in the meantime). During this process, I came across a few  
open
issues that I would like to get some feedback/discussion.

Here is a list of issues (details below) that I would like to go  
through on the TelCon:

- MPIT Session management
- String interfaces and compatibility with rest of the standard
- Use of handles for environment variables
- Routines to control environment variables
- How to integrate MPIT into the standard

In particular the first two issues have quite an impact on the API and  
the MPIT
document. it would be great if we could find consensus for them.

If anyone has anything else, please send it out to the group.

Thanks and talk to you on Monday,

Martin

MPIT Sessions
---------------------

The current version of the MPIT interface is global, i.e., initialized  
once and all
counters are set/reset globally. This works for one tool, but many  
said that this
may not be sufficient. E.g., libraries may want to monitor some data,  
which would
then conflict with other libraries or tools.

We already adjusted the init and finalize routines such that they can  
be called
multiple times, but that is only a partial solution. I think we need  
to keep this
feature, though, but the question is whether we want to go beyond that.

One idea was to have explicit session - instead of an init call, one  
would
"create" an MPIT session and all subsequent calls to MPIT would take a  
session
handle as the first argument. All counters allocated and probed during
a session would then only be relative to the session in which they are
called.

This would certainly complicate the interface, but would also make it  
more
general and more usable. Opinions? Does anyone see any obvious
problems with going down this route?

String Interfaces
----------------------

For many API calls, the MPIT interface relies on being able to return  
strings. The only
other call (afaik) is MPI_Get_processor_name, which uses the following  
method:

Define a constant for maximum string length
OUT parameter for string buffer (written by the routine), which must  
be at least as
large as the constant
OUT parameter for the number of characters written by the routine

In one of the presentations at the forum, some commented that this is  
not ideal
since it can easily lead to buffer overflow. The suggested alternative  
was to
include an IN parameter (or make the length an IN/OUT parameter) that  
allows
the caller to specify how long the buffer is.

If this is set to 0, the routine would not copy the string, but  
instead return the
size of the string it intends to write. Tools can then allocate an  
appropriate
size buffer.

For MPIT this could be especially useful, since we intend to return  
description
texts, which can be long.

With this we have the following options (please add if there are more):

- Stay compatible with the rest of the standard and use the techniques  
from
   MPI_Get_processor_name
- Adopt the new interface for all MPIT calls only
- Adopt the new interface for all MPIT calls and propose a new routine  
for
   MPI_Get_processor_name in MPI-3 with the new interface
- Include both interfaces into MPIT (and propose a matching second  
routine
   for MPI_Get_processor_name)

I personally think that we (for now) should stay away of changing
MPI_Get_processor_name and hence would favor one of the first two
solutions.

Handles for Environment Variables
------------------------------------------------

Performance variables are accessed through handles to improve access
speed. Initially we thought that we won't need this for environment  
variables,
However, some have commented that this may also be useful and that we
should add this. Are there any good use cases? If we decide to add this,
should there be separate handles for performance and environment
variables (since the query routines return significantly different  
information)
or not (to reuse the actual data read and set routines)?

Ability to Control Environment Variables
------------------------------------------------------

The API includes the ability to set environment variables (if the MPI  
library
enables this). There is actually quite a bit of interest in this  
feature from
the tools and application developer community (we had some discussions
on this at SciDAC 2010). To make this more efficient, one suggestion was
to allow "batch" updates - the idea behind this is that each single  
update
may need global synchronization and/or may be costly. In this cases, the
MPI implementation could delay the settings until an "apply call".

One proposal for this would include:

- Return information on whether a variable is delayed by MPI to a global
apply call (as part of the query call)
- A new API call, which is globally synchronizing (perhaps restricted
to a communicator?), which then tries to apply all changes made since
the last apply call.

Variations are certainly possible.

Question to the MPI implementors: would this be useful or do you not
expect changes to environment to be costly/globally synchronizing?

Integration into the standard
--------------------------------------

Since we removed the debugging APIs from the proposal, we are proposing
only one addition. We could do this one of two ways:

- Add a new chapter just for MPIT
- Create a shared chapter for tools (with PMPI and MPIT)

I think, based on forum feedback, we still have a majority for the  
second option,
which would also be my preference. Any other opinions on this?

Decided Issues / pending integration into the document
---------------------------------------------------------------------------

- Hierarchy and Group interface
   Proposed by Dave and discussed separately
- Textual updates
- Complete list of error codes / reorganization into a single table

(there are probably more - don't have the full list in front of me at  
the moment)

________________________________________________________________________
Martin Schulz, schulzm at llnl.gov, http://people.llnl.gov/schulzm
CASC @ Lawrence Livermore National Laboratory, Livermore, USA