[Mpi3-tools] Reviving old proposal: finding tools DLLs
John.DelSignore at roguewave.com
Mon Feb 13 10:26:35 CST 2012
Jeff Squyres wrote:
> I would like to revive the following old proposal:
Seems like a worthy activity.
> Digging a little in SVN, I found text supporting this proposal -- see
> section 1.5., "Locating Tool Interface Symbols", in the attached PDF
> (it was part of a larger "tools" chapter).
> Does anyone remember the state that we left this proposal in?
> 1. The proposal, as it reads, looks fine to me. But that's likely
> because I wrote it. :-) I don't remember if there was any substantive
> WG/Forum feedback on this proposal. Do you?
I don't think we ever gave it the scrutiny it needs, and I think it's pretty close, though I do have a few concerns I outline below.
> 2. Or did we let it go because we concentrated on the MPIR document
> first, and wanted to get that done before advancing this one?
I think we did want to focus on the MPIR Process Acquisition Interface document and the MPIT document first, so message queue display and dll finding took a backseat.
> 3. ...?
IIRC, I originally included the description of MPIR_dll_name in the MPIR Process Acquisition Interface document, but it was removed because strictly speaking the variable is not needed by MPI process acquisition, it is needed by MPI message queue display (MQD). At one point a year or two ago I had volunteered to write up the message queue display too, but I never found time to do that, so as far as I know MPIR_dll_name and the MQD interface are not "officially" described anywhere yet.
Here are my concerns with the current proposal...
The new scheme seems to do a reasonable job at separating out 32-bit vs 64-bit libraries, but it has no provisions for cross or heterogeneous debugging. When the debugger is loading a MQD DLL for a target process, it want to find a library that can be validly loaded into the debugger's address space and is appropriate for the target process. It says. "For example, a tool can attempt to dynamically load each of the files in the array; the file that loads successfully is likely a good candidate to be used by the tool." It's the "likely a good candidate" part that makes me concerned.
If we take the LANL Roadrunner machine as an extreme example, the debugger might be targeting a mixture of Linux-x86_64 and Linux-Cell processes, call this the "target platform" for a given target process. Also, various debugger processes may be running on Linux-x86_64 and Linux-Cell, call this the "debugger platform" for a given debugger process. Note that it is not necessarily the case that the "target platform" is the same as "debugger platform" when the MQD DLL is loaded.
To cover the cross or heterogeneous debugging cases, I think the debugger needs to search the list of DLLs looking for one that is appropriate for both the "debugger platform" and the "target platform". The current proposal allows the debugger to find one that is appropriate for the "debugger platform", but it may not be the right DLL for the "target platform". Perhaps we need a way to be more explicit about the debugger and target platforms for each entry in the vector. Do we care about solving this problem?
Another issue I see with the current definition is where the DLL name variables live, which is currently defined to live in the MPI process itself. I was wondering is there was any benefit to allowing (but not requiring) the DLL name variables to live in the MPI starter, rather than the MPI processes. Most implementations currently put MPIR_dll_name in the MPI process, but that may require the debugger to read the variable from each MPI process, which creates scalability issues if the debugger wants to load the DLL at a single point in its front-end process. I'd be interested to hear if other feel this is an issue.
Cheers, John D.
More information about the mpiwg-tools