[Mpi-forum] Question about MPI-4.1's MPI_Get_hw_resources_info()

Guillaume Mercier guillaume.mercier at u-bordeaux.fr
Thu Nov 16 02:02:14 CST 2023


Hi Jeff,

Let me revise my first answer and be more specific on a couple
of points you raise in your message.

Remember that until MPI 4.1, there was no standard way to
provide a value to the "mpi_hw_resource_type" info key that can guide 
the splitting of communicators on hardware-basis
(i.e. with a call to MPI_comm_split_type with  MPI_COMM_TYPE_HW_GUIDED
as the input split_type value). MPI_Get_hw_resource_info fills
this gap and makes applications  that rely on this mechanism more
portable than previously.

On 16/11/2023 03:09, Jeff Squyres (jsquyres) via mpi-forum wrote:

>      2. For example, my company makes a piece of hardware that can have
>         thousands of virtual NICs on it, and those virtual NICs might
>         even migrate around to different pieces of hardware (e.g., they
>         can migrate between different fiber optic outputs on the same
>         NIC).  MPI processes are assigned to a virtual NIC, not a
>         hardware NIC.  Am I allowed to include a reference to these
>         virtual NICs in the keys/values that are returned (since the
>         Linux device name refers to a virtual entity, not necessarily a
>         specific set of hardware)?  If so, how do I determine the
>         true/false value to assign?

On second thoughts, since these virtual NICs are software "instances"
(for the lack of a better word), I'm not sure that they should be
listed as keys in the resulting MPI_Info object. I'd like to discuss
this more with you.

>      3. The text states that the info keys/values are specific to the
>         point of time when the call is made.  p446:11-12 even explicitly
>         states that the process and/or its hardware restrictions may
>         change over time.  So even if I grokked what "restricted to a
>         single instance of a hardware resource of that type" is intended
>         to mean, if things can change -- and they can -- what is the
>         point of giving a true or false value to the user?

Things can change, but not systematically. I don't think that current
applications modify the binding of their MPI process that often.
Therefore, in the majority of cases, the information you get after
the first call to the procedure is likely to remain valid until
the application's end.


>      4.
>         Is the intent that keys will include a specific, unique
>         reference to an instance of "hardware" (e.g., a PCI address)? 
>         If so, then the value of "true" and "false" becomes even more
>         nebulous (or meaningless).  E.g., if I list a key containing
>         "cisco-nic-12bc83fde9" to indicate a specific NIC, what is the
>         exact "hardware resource of that type", and/or how would an
>         application know that "cisco-nic-12bc83fde9" and
>         "cisco-nic-bbbbbbbbb" are of the same "hardware resource type"?

In this case, both guillaume://cisco-nic set to "true" AND 
jeffS://cisco-nic-12bc83fde9 set to "true" seem acceptable to me.
It befalls the user to pick a provider and thus to consider which 
information should be effectively used.
MPI_Get_hw_resource_info "only" fills the gap
between the application and the lower-level mechanims that can be used
to retrieve this kind of information without resorting to call this
lower-level mechanims directly in the application.

>      5. I can imagine that there could be many different scenarios here;
>         can someone provide some guidance on what exactly an
>         implementation is supposed to do here?  This text seems to be...
>         ambiguous.

What you call ambiguous, I would call flexible ;)
But joking aside, the text can surely be improved and I'd more
than happy to take your input into account and come up with an
even better version for MPI 4.2 or 5.0.

>  2. The AtoI in p445:42-46 says that we should use URIs with a type of
>     "openmpi://" or "hwloc://" or "pmix://" or "openmpi://" or
>     "slurm://" or ...
>      1. All of these are software models (although hwloc's data refers
>         to either hardware or to software devices that correspond to
>         some form of hardware -- although that's not always clear, either).

The provider only indicate where the information comes from, as two 
different sources might report slightly different things. I'll take your
previous "cisco-nic "example: hwloc might choose to report only a 
"cisco-nic" type while Cisco's tool might report more precise 
information. I think it would be detrimental to the user to not report
all possible informations. Then about software models, I would surely
qualify "openmp" as a software model but not the others.


>      2. The use of software models in the text is confusing, because the
>         routine has "hw" in its name, strongly implying that there's
>         supposed to be a direct tie-in to hardware.

Software models are only used as potential providers, nothing more.
Fundamentally  I don't see the difference with what hwloc does 
(information reporting) and what this function does. Or maybe I didn't
understand your comment right?

Cheers,
Guillaume
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_0xF63A997035501534.asc
Type: application/pgp-keys
Size: 23772 bytes
Desc: OpenPGP public key
URL: <http://lists.mpi-forum.org/pipermail/mpi-forum/attachments/20231116/92c79185/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 840 bytes
Desc: OpenPGP digital signature
URL: <http://lists.mpi-forum.org/pipermail/mpi-forum/attachments/20231116/92c79185/attachment-0001.sig>


More information about the mpi-forum mailing list