[Mpi3-tools] Notes from today's telecon

Mon Jan 12 11:00:44 CST 2009

All:

Here are my notes from today's telecon.

Participants:

Dong Ahn (LLNL)
Chih-Ping Chen (Intel)
John DelSignore (TotalView Tech)
Bronis de Supinski (LLNL)
Chris Gottbrath (TotalView Tech)
Marty Itzkowitz (Sun)
Bob Moench (Cray)
Phil Roth (ORNL)
Martin Schulz (LLNL)
Jeff Squyres (Cisco)

Notes:

Chris has added stuff to the variables section and
some other edits to the evolving MPIR document:

https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/MPI3Tools/mpir-current

For now, they are emphasized by being preceded by
Chris's name.

Still need to clean up the document, as per John's
email -- we need things like types for example. It
is still pretty incomplete and has a lot of ambiguities.
TotalView will try to fill it in since they have a
lot of the historical knowledge of what has happened.
They see lots of different implementations and styles
for using the interface. That will help us understand
what we what do going forward, including extending it
for MPI-2 constructs like dynamic processes.

What form do we want to maintain the document in? The
simplest thing is to leave it as a document on the Wiki.
We will refer to that in the MPI-2.2 standard. The plan
is to try to get a revised and improved interface into
MPI-3. John's model is based on Dwarf -- have an author
and an editor and a list of contributors; we probably
don't need an editor given the shorter term nature of
this document. John will try to come up with a Word
document that captures the historical perspective.
We could use the MPI subversion repo. The revised
plan is that John will get an initial draft together,
either as a text document or in Word and then Martin
will transform it into LaTex using the MPI standard
style sheets. John will check with his management in
order to determine the time frame in which it can be
completed. He esitmates a month or two. The overall
goal is to have the final version done by the time
MPI-2.2 is fully approved (September 2009?).

Chris presented the Euro PVM/MPI paper on their proposal.
The paper can be accessed through this link:

http://www.open-mpi.org/papers/euro-pvmmpi-2006-mpi2-debugging/euro-pvmmpi-2006-mpi2-debugging.pdf

Tis interface supports dynamic processes. It was also
designed to handle large scales by allowing the proctable
information to be distributed. Chris sent his slides
to the tools list; someone will make sure they are made
available on the Wiki.

Subset attach is an important consideration, as is the
naming of nodes, which becomes more complicated with
dynamic processes. Must support Spawn, Connect/Acept
and Join. Overhead needs to be low enough that MPI
can support it during normal job operations so that
user can attach to a running job ("on by default").

Distributes the process table so debugger (may) need to
reassemble it. Dong: Does this imply that debugging
launching will be hierarchical? The process of the
debugger beoming attached to processes may be multi-step,
which may not need to imply hierarchical launching.
Can the debugger use resource manager provided
functionality to do bulk launching? Ability to distribute
the proctable does not preclude the possibility of
having a centralized table. It is important, perhaps
even more important, to have the appropriate support
in the runtime system, whether it is the MPI runtime
or the job manager (the tool doesn't care where it
lives as long as the facilities are there). Launching
the servers, establishing communication and finding
the topology of the machine is currently ad hoc. A
good thing would be for the standard to encompass
these aspects.

The goal should be to define an API and leave the
implementation open. This is consistent with the
MPI standard in general. It needs to be more of an
abstraction that the tool calls for the information.

An original design constraint was that the interface
cannot require the target to make progress. A DLL
that the tool can load and call should be able to
provide the needed abstractions. John thinks we may
be going down the wrong path if we consider what
Chris is presenting an interface since it might
be easier not to require the table be distributed.
The original interface had a problem in that it
dictated the implementation a little too much.

Interface should give information on what the job
looks like. John thinks it also should give services
like spawning the daemons and establishing communication
between them. That would provide an interface we could
point at that would mean implementers can support
their tools in a scalable fashion.

Chris's presentation provides a good implementation
that can help us design the API that we need. Chris
agrees that we do not yet have the API written down.
MIght be useful to have two interfaces: one that
targets a small scale system in which a central RM
manages all the dynamic process operations and can
store all of the information in one place; and one
that is more scalable and supports the distributed
table in the Euor PVM/MPI paper.

Synchronization is an important consideration -- allow
user to debug job early in a well defined state.

Prcoess table entries -- probably things that need
to be accessible through any API.

Process naming is an important consideration -- you
want to have a notion of repeatable naming for scripting.
If you run the program twice in the same way and it
is deterministic then you should beable to interact
with it through the debugger in the same way. With
dynamic processes, there is no universal name space.
There are several MPI_COMM_WORLDs; a notion of
MPI_COMM_UNIVERSE would be useful but is not consistent
with the design of MPI dynamic processes. Thus, a
hierarchical naming scheme is probably necessary.

Different levels of notification -- user may want to
know about all spawns but they might not. The notification
mechanism should minimize the impact on MPI processes.

TotalView was developing an implementation in collaboration
with MPICH and OpenMPI. TotalView did the first stages
of its implementation (not completed but workable). No
one has implemented it on the MPI side yet.