[mpiwg-tools] Debugger spawn

Chris January chris.january at allinea.com
Tue Jul 19 03:27:08 CDT 2016


Hello Ralph,

On 18/07/16 20:00, Ralph Castain wrote:
> We’ve been chatting in the meetings about how to possibly use PMIx for obtaining proctable info and having the resource manager (or mpirun) launch debugger daemons. I have prototyped some code for PMIx that supports these operations (will commit it to PMIx for the 2.0 release, and it will be in OMPI master shortly), and written a sample debugger startup tool (see attached) that illustrates how it would be used.
>
> I think you will find it relatively simple. We can add/subtract/modify the returned proctable data as required.

Thank you for sending over the sample code. I have a couple of questions 
that concern the the case where the tool is actually starting the job 
itself (e.g. running mpiexec -n ...):
1. How can the tool ensure that the job does not start executing 
(beyond, say, MPI_Init) before the tool has attached? In MPIR, if 
MPIR_being_debugged is set in the starter process, the MPI processes 
wait at a barrier before or inside MPI_Init until the starter process 
returns from the MPIR_Breakpoint function.
2. Let's say the resource manager has a command like SLURM's srun that 
can both make an allocation, and also start a job running. In this case, 
if the tool starts the job itself by running srun ..., it will be 
outside the resource manager's allocation. How, in that case, will the 
PMIX_tool interface know which job the tool wants to work with? How will 
the tool find the server PID it needs to pass?

Yours,
Chris January - VP Engineering - Allinea Software Ltd.



More information about the mpiwg-tools mailing list