[Mpi-forum] Question about MPI-4.1's MPI_Get_hw_resources_info()

Jeff Squyres (jsquyres) jsquyres at cisco.com
Wed Nov 15 20:09:44 CST 2023


I see that MPI_Get_hw_resource_info() was introduced in MPI-4.1 p445:13.

I'm a little confused by the description of this routine.  p445:30-32 says (the PDF won't copy-n-paste this section for some reason, so I'm copy-n-pasting from the corresponding LaTeX source):

This information is stored as (\mpiarg{key},\mpiarg{value}) pairs where each key is the name of a hardware resource type and its value is set to \infoval{true} if the calling \MPI/ process is restricted to a single instance of a hardware resource of that type and \infoval{false} otherwise.  The order in which the keys are stored in \mpiarg{hw\_info} is unspecified.  This procedure will return different information for \MPI/ processes that are restricted to different hardware resources. Otherwise, info objects with identical (\mpiarg{key}, \mpiarg{value}) pairs are returned.

  1.  I'm not quite sure what the "true" and "false" values mean.
     *   E.g., what -- precisely -- does "a single instance of a hardware resource of that type" mean?
     *   For example, my company makes a piece of hardware that can have thousands of virtual NICs on it, and those virtual NICs might even migrate around to different pieces of hardware (e.g., they can migrate between different fiber optic outputs on the same NIC).  MPI processes are assigned to a virtual NIC, not a hardware NIC.  Am I allowed to include a reference to these virtual NICs in the keys/values that are returned (since the Linux device name refers to a virtual entity, not necessarily a specific set of hardware)?  If so, how do I determine the true/false value to assign?
     *   The text states that the info keys/values are specific to the point of time when the call is made.  p446:11-12 even explicitly states that the process and/or its hardware restrictions may change over time.  So even if I grokked what "restricted to a single instance of a hardware resource of that type" is intended to mean, if things can change -- and they can -- what is the point of giving a true or false value to the user?
     *
Is the intent that keys will include a specific, unique reference to an instance of "hardware" (e.g., a PCI address)?  If so, then the value of "true" and "false" becomes even more nebulous (or meaningless).  E.g., if I list a key containing "cisco-nic-12bc83fde9" to indicate a specific NIC, what is the exact "hardware resource of that type", and/or how would an application know that "cisco-nic-12bc83fde9" and "cisco-nic-bbbbbbbbb" are of the same "hardware resource type"?
     *   I can imagine that there could be many different scenarios here; can someone provide some guidance on what exactly an implementation is supposed to do here?  This text seems to be... ambiguous.
  2.  The AtoI in p445:42-46 says that we should use URIs with a type of "openmpi://" or "hwloc://" or "pmix://" or "openmpi://" or "slurm://" or ...
     *   All of these are software models (although hwloc's data refers to either hardware or to software devices that correspond to some form of hardware -- although that's not always clear, either).
     *   The use of software models in the text is confusing, because the routine has "hw" in its name, strongly implying that there's supposed to be a direct tie-in to hardware.
     *   What is the intent here?
  3.  I'm not quite sure what the limitation of "This procedure will return different information for MPI processes that are restricted to different hardware resources" means.
     *   What if a) an MPI implementation returns an Info with a single key denoting the NIC, and b) the NIC is a generic Ethernet NIC (there's only one NIC in the node).
     *   On that NIC, from a fine-grained perspective, the MPI processes use different hardware resources, but from a coarse-grained perspective of the identification of "NIC", multiple MPI processes use the "same" NIC.
     *   Per the text, is an MPI implementation prohibited from returning the same value "blah://the_nic" in multiple MPI processes?  Or is an implementation required​ to return the same value "blah://the_nic" in all MPI processes on that node?  I really can't tell which way it's supposed to go.

In short, I find the text description of this function to be suitably ambiguous such that I could put anything I want in the info keys and corresponding values, and be able to justify it with one of a bunch of different interpretations of the text on pages 445-446.

--
Jeff Squyres
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpi-forum/attachments/20231116/7e62ecf8/attachment-0001.html>


More information about the mpi-forum mailing list