[mpiwg-tools] QMPI API (After 2020-10-08 Discussion)

Wesley Bland work at wesbland.com
Tue Oct 13 10:40:54 CDT 2020

> On Oct 13, 2020, at 4:19 AM, Joachim Protze <protze at itc.rwth-aachen.de> wrote:
> Hi Wesley,
> most of the functions are as expected, just the last part makes me
> wonder how this should work.
> Am 08.10.20 um 22:54 schrieb Wesley Bland via mpiwg-tools:
>> int QMPI_Register_tool_name(char * tool_name, void (*
>> init_function_ptr)(int tool_id));
>> tool_name - [IN] Character string - A unique string representing the
>> name of the tool
>> init_function_ptr - [IN] Function pointer - Pointer to a function that
>> MPI will called before MPI is initialized
> Since you start operating with function pointers, I'm wondering whether
> all the functions below should also be function pointers (with some
> lookup mechanism) rather than public interface functions.

I’m not sure what the benefit of that would be and there are significant performance drawbacks to taking functions that could otherwise be inlined and forcing them to be function pointers. For some of these functions, it may not make a lot of difference, but for those in the critical path, it makes a huge difference.

> Also, is the tool always guaranteed to find the QMPI_Register_tool_name
> function? I don't remember the name, so I didn't find documentation, but
> I think that some linkers support encapsulated name spaces for dynamic
> libraries.

This is a function that would be provided by MPI, like any other function, so I’m not sure what problem you’re describing.

>> tool_storage - [IN] Address - A pointer to an address where the tool has
>> data being stored for its particular instance
>> This function allows the tool to provide a storage object (tool_storage)
>> which lets the tool store information specific to its instance of itself
>> (in case there are multiple copies of a tool). The specific instance of
>> the tool is identified with the tool_id value and should be matched with
>> the one returned by the callback function that the tool registered with MPI.
> From my perspective, this structure holds all state which a particular
> instance of a tool has.

I agree as long as what you mean is any state that the tool manages. None of the MPI-managed state goes in here. This is what we were calling the context object up to now, but I’m proposing that we call it “storage" and call the thing that contains it “context".

>> *Interception*
>> When the tool's interception function is called, it can get any
>> tool-specific information from the context object. The fields in the
>> context object can be retrieved with these functions:
>> QMPI_Context_get_storage(QMPI_Context context, int tool_id, void **storage);
>> context - [IN] Opaque handle - A handle to a context object containing
>> tool and function-specific information
>> tool_id - [IN] Integer - A unique identifier for the tool
>> storage - [OUT] Address - A pointer to an address where the tool has
>> data being stored for its particular instance
>> Get the back storage object the tool provided
>> to QMPI_Register_tool_storage that matches tool_id.
> Where does this tool_id come from? Is it passed in as an argument to the
> interception function? The tool does not know the id.

Good point. I forgot to give an example of what the interception functions look like. I believe the intention after the call last week was to have two additional fields at the beginning of every MPI function. So, as an example, an MPI_SEND interception function would look like this:

MPI_Send_example(QMPI_Context context, int tool_id, const void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm);

The reason the tool_id is included in the argument list and not embedded in the context object is because the context object does not need to change from tool to tool on the way down the stack. It can contain an array of all storage objects, indexed by the tool_id. The context object only changes if a tool calls a function other than the one it is already in (e.g., an MPI_Allreduce calls MPI_Send/MPI_Recv).

>> *Calling Additional QMPI Tools*
>> When the tool is done with its own interception, it should call this
>> function to determine the next tool in the QMPI stack (which the tool
>> itself is responsible for calling):
>> int QMPI_Get_function(int tool_id, enum QMPI_Functions_enum
>> function_enum, void (** function_ptr)(void), QMPI_Context *
>> next_tool_context);
>> tool_id - [IN] Integer - A unique identifier for the tool
>> function_enum - [IN] Enum value - An enumerated value specifying which
>> MPI function is being registered for interception.
>> function_ptr - [OUT] Function pointer - A pointer to a function that
>> should be called when an application calls the specified function.
>> next_tool_context - [OUT] Opaque handle - A handle to a context object
>> containing tool and function-specific information
>> This function also uses tool_id and function_enum to determine the
>> current tool's ID and the enum value for the function being queried. It
>> also provides a function pointer in a similar format to the registration
>> function. Finally, the function returns the context object of the next
>> tool in the call stack which should be used when calling the returned
>> function.
> Why does this function not need the current QMPI_Context context?
> I would store a pointer to the function-table in this context and
> implement QMPI_Get_function as a macro accessing this table.

That is one way to do it that would need the current context object. In my implementation, I’ve been keeping that function table in MPI's memory so there was no need to put it in the context object. The function table doesn’t change from function to function so I didn’t see a reason to make multiple copies of it.

> I assume the storage for next_tool_context is stack memory from the
> caller (the tool)?

Yes, I think that makes the most sense. So essentially, each tool holds the context object for the next tool in its own stack memory until it comes back up the stack and the memory is released. The alternative is for MPI to be responsible for allocating and garbage collecting these objects.

> I would assume, that his function also provides the next tool id, to
> pass it with the function call.

You’re right. I forgot to include that in this function signature. I’ll add that to my documentation.

> Alternatively, the tool ID could be a field in next_tool_context. Then
> we need a function to query the tool ID from the context. Also,
> QMPI_Context_get_storage does not need the tool_id as input in this case.

I’m not sure this works for the reason I described above with the MPI_Send_example function.


> Best
> Joachim
> -- 
> Dipl.-Inf. Joachim Protze
> IT Center
> Group: High Performance Computing
> Division: Computational Science and Engineering
> RWTH Aachen University
> Seffenter Weg 23
> D 52074  Aachen (Germany)
> Tel: +49 241 80- 24765
> Fax: +49 241 80-624765
> protze at itc.rwth-aachen.de
> www.itc.rwth-aachen.de

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-tools/attachments/20201013/f1734acf/attachment-0001.html>

More information about the mpiwg-tools mailing list