[Mpi-forum] Question about MPI-4.1's MPI_Get_hw_resources_info()

Guillaume Mercier guillaume.mercier at u-bordeaux.fr
Wed Nov 15 23:09:14 CST 2023


Hello Jeff,

This procedure is the missing companion of the original hardware 
communicators proposal introduced in MPI 4.0 (i.e MPI_COMM_TYPE_HW_GUIDED)

On 16/11/2023 03:09, Jeff Squyres (jsquyres) via mpi-forum wrote:

>  1. I'm not quite sure what the "true" and "false" values mean.
>      1. E.g., what -- precisely -- does "a single instance of a hardware
>         resource of that type" mean?

If an MPI process is not bound to this type of resource, you will get a 
"false". An example, imagine that you have an MPI Process bound to a
Package, then you will get "true" for "package" but "false" for the 
cores encompassed in this package. Then, if the MPI process were to be 
bound on a single core, you would get "true"  for both Package and Core.

>      2. For example, my company makes a piece of hardware that can have
>         thousands of virtual NICs on it, and those virtual NICs might
>         even migrate around to different pieces of hardware (e.g., they
>         can migrate between different fiber optic outputs on the same
>         NIC).  MPI processes are assigned to a virtual NIC, not a
>         hardware NIC.  Am I allowed to include a reference to these
>         virtual NICs in the keys/values that are returned (since the
>         Linux device name refers to a virtual entity, not necessarily a
>         specific set of hardware)?  If so, how do I determine the
>         true/false value to assign?

Yes, you are and you determine the situation at the time of the call.

>      3. The text states that the info keys/values are specific to the
>         point of time when the call is made.  p446:11-12 even explicitly
>         states that the process and/or its hardware restrictions may
>         change over time.  So even if I grokked what "restricted to a
>         single instance of a hardware resource of that type" is intended
>         to mean, if things can change -- and they can -- what is the
>         point of giving a true or false value to the user?

Once again, you have the situation a the time of the call and if the 
binding of processes changes over time or if processes are migrated,
you may want to know the new current status of affairs.

>      4.
>         Is the intent that keys will include a specific, unique
>         reference to an instance of "hardware" (e.g., a PCI address)? 
>         If so, then the value of "true" and "false" becomes even more
>         nebulous (or meaningless).  E.g., if I list a key containing
>         "cisco-nic-12bc83fde9" to indicate a specific NIC, what is the
>         exact "hardware resource of that type", and/or how would an
>         application know that "cisco-nic-12bc83fde9" and
>         "cisco-nic-bbbbbbbbb" are of the same "hardware resource type"?

Thr text reads  "[...] if the calling
MPI process is restricted to a single instance of a hardware resource of 
that *type*".  So, to me, I would not list "cisco-nic-12bc83fde9" in the 
first place. Then, in case you have both "cisco-nic-12bc83fde9" and 
"cisco-nic-bbbbbbbbb", it is either the role of the piece of software
to tell you that they are the same or it's a decision that the user has 
to make at the application level because they know their hardware and
can assess the situation.


>      5. I can imagine that there could be many different scenarios here;
>         can someone provide some guidance on what exactly an
>         implementation is supposed to do here?  This text seems to be...
>         ambiguous.

It well may be and it can be improved. Thanks for you input.
An MPI implementation agnostic version can be found here:
https://gitlab.inria.fr/hsplit/hsplit/-/blob/master/src/hsplit.c?ref_type=heads#L1315


>  2. The AtoI in p445:42-46 says that we should use URIs with a type of
>     "openmpi://" or "hwloc://" or "pmix://" or "openmpi://" or
>     "slurm://" or ...
>      1. All of these are software models (although hwloc's data refers
>         to either hardware or to software devices that correspond to
>         some form of hardware -- although that's not always clear, either).
>      2. The use of software models in the text is confusing, because the
>         routine has "hw" in its name, strongly implying that there's
>         supposed to be a direct tie-in to hardware.
>      3. What is the intent here?

The intent is to give more information about the source of information.
So, you get an information based on a provider, not just an information
as different providers may see/report the situation slightly differently.

>  3. I'm not quite sure what the limitation of "This procedure will
>     return different information for MPI processes that are restricted
>     to different hardware resources" means.
>      1. What if a) an MPI implementation returns an Info with a single
>         key denoting the NIC, and b) the NIC is a generic Ethernet NIC
>         (there's only one NIC in the node).
>      2. On that NIC, from a finally e-grained perspective, the MPI processes
>         use different hardware resources, but from a coarse-grained
>         perspective of the identification of "NIC", multiple MPI
>         processes use the "same" NIC.
>      3. Per the text, is an MPI implementation prohibited from returning
>         the same value "blah://the_nic" in multiple MPI processes?  Or
>         is an implementation *required*​ to return the same value
>         "blah://the_nic" in all MPI processes on that node?  I really
>         can't tell which way it's supposed to go.

I don't see a difference with the case where multiple MPI processes
are bound to the same Package (for instance). If processes are using the
same NIC, they are then using the same HW resource of that type.
"blah://the_nic" should be set to "true" for all these processes.

> In short, I find the text description of this function to be suitably 
> ambiguous such that I could put anything I want in the info keys and 
> corresponding values, and be able to justify it with one of a bunch of 
> different interpretations of the text on pages 445-446.

I beg to disagree with your interpretation. Can the description of the 
procedure be twisted enough so that you end up wit something totally 
meaningless? Maybe. Can it be improved? Surely. But is it useful? 
Definitely.

I'll be happy to discuss this more with you.

Best,
Guillaume

-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_0xF63A997035501534.asc
Type: application/pgp-keys
Size: 23772 bytes
Desc: OpenPGP public key
URL: <http://lists.mpi-forum.org/pipermail/mpi-forum/attachments/20231116/aca3d8d6/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 840 bytes
Desc: OpenPGP digital signature
URL: <http://lists.mpi-forum.org/pipermail/mpi-forum/attachments/20231116/aca3d8d6/attachment-0001.sig>


More information about the mpi-forum mailing list