[mpiwg-tools] Per the QMPI/PMPI+1/whatever discussion yesterday

Tue Dec 9 07:31:46 CST 2014

This morning, I have a bunch of follow-on thoughts to the QPMI/PMPI+1/whatever tools WG face-to-face discussion yesterday.  We were all getting tired at the end, and I don't think we documented everything well.  Here are my thoughts this morning, in no particular order:

1. New names: QMPI_Tool_add / QMPI_Tool_get.  In general, using the "QMPI_" prefix should be sufficient for distinguishing any new functions we create here from both the PMPI_ world and from the MPI_T_ world.  E.g., even though the 2 names I propose here include the word "tool", I think the QMPI_ prefix is sufficient to distinguish them from the MPI_T_ tool genre.

2. Things we didn't document yesterday:

- Need to decide a free-memory model for QMPI_Tool_get (i.e., how is the argv that is returned via QMPI_Tool_get returned).
- Need to specify that the strings passed in to QMPI_Tool_add are safe to be freed after QMPI_Tool_add returns (i.e., imply that the MPI implementation will make a copy of the string if it needs it)
- Need Fortran bindings for the QMPI_* functions.
- In the union of fields passed to the tool-provided QMPI_multiplex function for each, some functions will need to pass an enum to indicate which MPI binding the intercept came from (e.g, C, mpif.h, use mpi, use mpi_f08)
- In all cases, the tool-provided QMPI_multiplex function will be passed the C versions of MPI handles.  I.e., if a Fortran program calls MPI_Send which ends up invoking a tool's QMPI_multiplex function, the QMPI_multiplex function will receive C versions of the communicator and datatype handles.

3. Random thought -- do we need QMPI_Control?  This would be analogous to MPI_Control -- it would be used for communicating directly with tools.  Its first argument could be a (const char*) that corresponds to the strings that you get back from QMPI_Tool_get (i.e., allowing an app to communicate with a specific tool).

4. We haven't crisply laid out the goals for QMPI/PMPI+1/whatever.  That is -- to paraphrase Kathryn's sentiments from yesterday: how is this interface *better* than the current PMPI world?  I think the goals are:

- Avoid relying on linker tricks (although this is kinda weak -- linker tricks work just fine in many cases, and, indeed, we're not outlawing linker tricks like LD_PRELOAD)
- Solve all language bindings issues (e.g., allow tools to be written in C)
- *Eventually* allow supporting N tools simultaneously
  --> Although this is just the first step in supporting that eventual goal

And again paraphrasing Kathryn (who was channeling Adam): given that the plan is that MPI-4 won't support N simultaneous tools, are these goals enough "better" than the current PMPI to be worth doing?

I think we need a crisp, irrefutable answer for that question before going before the entire Forum.

-- 
Jeff Squyres
jsquyres at cisco.com
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/