[mpiwg-tools] Debugger spawn
chris.january at allinea.com
Tue Jul 19 03:27:08 CDT 2016
On 18/07/16 20:00, Ralph Castain wrote:
> We’ve been chatting in the meetings about how to possibly use PMIx for obtaining proctable info and having the resource manager (or mpirun) launch debugger daemons. I have prototyped some code for PMIx that supports these operations (will commit it to PMIx for the 2.0 release, and it will be in OMPI master shortly), and written a sample debugger startup tool (see attached) that illustrates how it would be used.
> I think you will find it relatively simple. We can add/subtract/modify the returned proctable data as required.
Thank you for sending over the sample code. I have a couple of questions
that concern the the case where the tool is actually starting the job
itself (e.g. running mpiexec -n ...):
1. How can the tool ensure that the job does not start executing
(beyond, say, MPI_Init) before the tool has attached? In MPIR, if
MPIR_being_debugged is set in the starter process, the MPI processes
wait at a barrier before or inside MPI_Init until the starter process
returns from the MPIR_Breakpoint function.
2. Let's say the resource manager has a command like SLURM's srun that
can both make an allocation, and also start a job running. In this case,
if the tool starts the job itself by running srun ..., it will be
outside the resource manager's allocation. How, in that case, will the
PMIX_tool interface know which job the tool wants to work with? How will
the tool find the server PID it needs to pass?
Chris January - VP Engineering - Allinea Software Ltd.
More information about the mpiwg-tools