<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Ok, that's a fine intent. But all my specific questions remain -- e.g.,</div>
<ul data-editing-info="{"orderedStyleType":1,"unorderedStyleType":1}" style="list-style-type: disc;">
<li style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">What is the precise distinction between the "true" and "false" values of the info keys?</span></li><li style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">What is the technical benefit of providing the "true" and "value" values to the user/application in the info keys</span></li><li style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">What items can be listed in these hardware info keys? (e.g., what about virtual or software-only devices)</span></li><li style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">What is the relationship between the software models listed as examples for the URI prefixes and the hardware that
they are supposed to represent?</span></li><li style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
<span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">What is the precise definition of when an implementation is required to provide the same info keys/values between
processes?</span></li></ul>
<hr style="display: inline-block; width: 98%;">
<span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"><b>From:</b> Guillaume Mercier<br>
<b>Sent:</b> Thursday, November 16, 2023 3:02 AM<br>
<b>To:</b> Jeff Squyres (jsquyres)<br>
<b>Cc:</b> mpi-forum@lists.mpi-forum.org<br>
<b>Subject:</b> Re: [Mpi-forum] Question about MPI-4.1's MPI_Get_hw_resources_info()
</span>
<div><span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"><br>
</span></div>
<div><span style="font-size: 11pt;">Hi Jeff,<br>
<br>
Let me revise my first answer and be more specific on a couple<br>
of points you raise in your message.<br>
<br>
Remember that until MPI 4.1, there was no standard way to<br>
provide a value to the "mpi_hw_resource_type" info key that can guide<br>
the splitting of communicators on hardware-basis<br>
(i.e. with a call to MPI_comm_split_type with MPI_COMM_TYPE_HW_GUIDED<br>
as the input split_type value). MPI_Get_hw_resource_info fills<br>
this gap and makes applications that rely on this mechanism more<br>
portable than previously.<br>
<br>
On 16/11/2023 03:09, Jeff Squyres (jsquyres) via mpi-forum wrote:<br>
<br>
> 2. For example, my company makes a piece of hardware that can have<br>
> thousands of virtual NICs on it, and those virtual NICs might<br>
> even migrate around to different pieces of hardware (e.g., they<br>
> can migrate between different fiber optic outputs on the same<br>
> NIC). MPI processes are assigned to a virtual NIC, not a<br>
> hardware NIC. Am I allowed to include a reference to these<br>
> virtual NICs in the keys/values that are returned (since the<br>
> Linux device name refers to a virtual entity, not necessarily a<br>
> specific set of hardware)? If so, how do I determine the<br>
> true/false value to assign?<br>
<br>
On second thoughts, since these virtual NICs are software "instances"<br>
(for the lack of a better word), I'm not sure that they should be<br>
listed as keys in the resulting MPI_Info object. I'd like to discuss<br>
this more with you.<br>
<br>
> 3. The text states that the info keys/values are specific to the<br>
> point of time when the call is made. p446:11-12 even explicitly<br>
> states that the process and/or its hardware restrictions may<br>
> change over time. So even if I grokked what "restricted to a<br>
> single instance of a hardware resource of that type" is intended<br>
> to mean, if things can change -- and they can -- what is the<br>
> point of giving a true or false value to the user?<br>
<br>
Things can change, but not systematically. I don't think that current<br>
applications modify the binding of their MPI process that often.<br>
Therefore, in the majority of cases, the information you get after<br>
the first call to the procedure is likely to remain valid until<br>
the application's end.<br>
<br>
<br>
> 4.<br>
> Is the intent that keys will include a specific, unique<br>
> reference to an instance of "hardware" (e.g., a PCI address)?<br>
> If so, then the value of "true" and "false" becomes even more<br>
> nebulous (or meaningless). E.g., if I list a key containing<br>
> "cisco-nic-12bc83fde9" to indicate a specific NIC, what is the<br>
> exact "hardware resource of that type", and/or how would an<br>
> application know that "cisco-nic-12bc83fde9" and<br>
> "cisco-nic-bbbbbbbbb" are of the same "hardware resource type"?<br>
<br>
In this case, both guillaume://cisco-nic set to "true" AND<br>
jeffS://cisco-nic-12bc83fde9 set to "true" seem acceptable to me.<br>
It befalls the user to pick a provider and thus to consider which<br>
information should be effectively used.<br>
MPI_Get_hw_resource_info "only" fills the gap<br>
between the application and the lower-level mechanims that can be used<br>
to retrieve this kind of information without resorting to call this<br>
lower-level mechanims directly in the application.<br>
<br>
> 5. I can imagine that there could be many different scenarios here;<br>
> can someone provide some guidance on what exactly an<br>
> implementation is supposed to do here? This text seems to be...<br>
> ambiguous.<br>
<br>
What you call ambiguous, I would call flexible ;)<br>
But joking aside, the text can surely be improved and I'd more<br>
than happy to take your input into account and come up with an<br>
even better version for MPI 4.2 or 5.0.<br>
<br>
> 2. The AtoI in p445:42-46 says that we should use URIs with a type of<br>
> "openmpi://" or "hwloc://" or "pmix://" or "openmpi://" or<br>
> "slurm://" or ...<br>
> 1. All of these are software models (although hwloc's data refers<br>
> to either hardware or to software devices that correspond to<br>
> some form of hardware -- although that's not always clear, either).<br>
<br>
The provider only indicate where the information comes from, as two<br>
different sources might report slightly different things. I'll take your<br>
previous "cisco-nic "example: hwloc might choose to report only a<br>
"cisco-nic" type while Cisco's tool might report more precise<br>
information. I think it would be detrimental to the user to not report<br>
all possible informations. Then about software models, I would surely<br>
qualify "openmp" as a software model but not the others.<br>
<br>
<br>
> 2. The use of software models in the text is confusing, because the<br>
> routine has "hw" in its name, strongly implying that there's<br>
> supposed to be a direct tie-in to hardware.<br>
<br>
Software models are only used as potential providers, nothing more.<br>
Fundamentally I don't see the difference with what hwloc does<br>
(information reporting) and what this function does. Or maybe I didn't<br>
understand your comment right?<br>
<br>
Cheers,<br>
Guillaume<br>
</span></div>
</body>
</html>