[mpiwg-tools] Meeting March 21, 2019
Dirk.Schubert at arm.com
Thu Mar 21 05:42:48 CDT 2019
The following use cases are important for Arm Forge (debugger DDT and profiler MAP).
Use case 1: Launch an MPI job under the control of a tool.
Input: The user's regular command line to start the MPI job, for example "mpirun -n 256 ./wave_c". Why? To make it as friction-less as possible to use a tool.
NB: The tool does not want to parse or interpret the MPI starter arguments, as the tool has no knowledge of all possible MPI starter arguments.
* Hold MPI processes for the tool to attach to them and to release them when attached.
Q: Where are processes held? In MPI_Init or _start?
* Spawn tool daemon on nodes where job is running.
Q: Co-spawn or separate?
Q: One per node or one per process?
Q: Environment? Same general environment as MPI processes instead of restricted environment (e.g. $HOME not set or chroot).
* Acquire the process table (rank, hostname, pid and executable path) of spawned MPI processes.
Q: Only global proctable or additionally local (per node) proctable?
* [Optional] Modify the environment of the MPI processes before launching, such as prepending to LD_PRELOAD or LD_LIBRARY_PATH to inject preloads into the MPI processes (but not other processes, such as MPI daemons or tool daemons).
Q: When? It would be great if modifying the environment could be done just before the MPI processes are forked/exec’ed. Why? For our profiler MAP we need to preload an MPI specific PMPI library, but for some “MPI” starter processes such as srun we don’t know the real MPI implementation that is used. Acquiring a partial process table (no PIDs) and inspecting the binaries pointed to by the executable paths, could allow us to detect the MPI implementation.
* [Optional] Raise queue allocation request and granted events to allow the tool to disable startup timeouts temporarily while a queue allocation is in progress, as this can take a long time (applicable to srun).
* [Optional] Scalable startup of tool daemons and shipping of preloads without touching the parallel file system on compute nodes. For example, pushing of daemon executable file (and dependencies plus preloads) to ram disk on compute nodes or Spindle?
Q: Where files are pushed? A job specific temporary directory such as /tmp/mpi.job.1234/... or configurable?
Q: How can the tool query the location of pushed files to reference them?
Q: Who cleans up the files afterwards? The MPI job/starter itself or the tool?
N: Files should be pushed with original file’s permissions.
NB: Optional requirement = A tool could ultimately work without it, but I very much would like to see it supported.
In PMIx this is the “indirect” use case and Ralph Castain and I have been working together since a while to make sure this use case will be supported by PMIx (excluding some optional requirements for now).
* How will the handshake work, such that the MPI starter knows it's running under a tool and when it's possible for a tool to interact with the MPI starter process?
* The handshake must support cases where the MPI starter filename that is provided by the user is a wrapper around the real MPI starter executable, for example XALT.
* In PMIx both problems are solved with PMIX_LAUNCHER_PAUSE_FOR_TOOL=1 and PMIX_LAUNCHER_RENDEZVOUS_FILE=<filename>.
* Will only the tool or also the tool daemons interact with the MPI job?
* Security, for example USER B shall not be able to attach to MPI job of USER A.
* Anything to consider for MPMD or heterogeneous systems?
* Must be scalable.
* Must not require a debugger.
Use case 2: Attach to a running MPI job with a tool.
This use case is a subset of use case 1 with the following requirements:
* Acquire the process table of spawned MPI processes of MPI job.
* Spawn tool daemon on nodes where MPI job is running.
* [Optional] Scalable startup of tool daemons without touching the file system.
Dirk Schubert | Arm | Staff Software Engineer
dirk.schubert at arm.com
allinea is now part of Arm
From: mpiwg-tools <mpiwg-tools-bounces at lists.mpi-forum.org> on behalf of Mohror, Kathryn via mpiwg-tools <mpiwg-tools at lists.mpi-forum.org>
Sent: 17 March 2019 14:46
To: mpiwg-tools at lists.mpi-forum.org
Cc: Mohror, Kathryn
Subject: [mpiwg-tools] Meeting March 21, 2019
For our call this Thursday (3/21) we’ll come back to debugger topics again. The call is at the usual time in the US (8 am Pacific/ 11 am Eastern / 4 pm CET) but note that the US has moved to DST but I don’t think the EU has done so yet.
In the last debugger call, I said that the plan for this meeting would be to get an overview of OMPD and start talking about analogous interfaces for MPI (e.g. revamp MQD). However, in the meantime, I was convinced that we should focus on process acquisition first. So, the plan will be to talk about process acquisition in this meeting. I’ll go through our notes from 2017(ish) to hopefully find the straw men we drafted back then. Please bring your use cases and any ideas you have on this front.
Kathryn Mohror, kathryn at llnl.gov<mailto:kathryn at llnl.gov>, https://people.llnl.gov/kathryn
Data Analysis Group<https://computation.llnl.gov/casc/data-analysis-group> @ Lawrence Livermore National Laboratory, Livermore, CA, USA
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the mpiwg-tools