<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>

</head>

<body dir="ltr">

<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

Ok, that's a fine intent.  But all my specific questions remain -- e.g.,</div>

<ul data-editing-info="{"orderedStyleType":1,"unorderedStyleType":1}" style="list-style-type: disc;">

<li style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

<span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">What is the precise distinction between the "true" and "false" values of the info keys?</span></li><li style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

<span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">What is the technical benefit of providing the "true" and "value" values to the user/application in the info keys</span></li><li style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

<span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">What items can be listed in these hardware info keys?  (e.g., what about virtual or software-only devices)</span></li><li style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

<span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">What is the relationship between the software models listed as examples for the URI prefixes and the hardware that

 they are supposed to represent?</span></li><li style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">

<span style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">What is the precise definition of when an implementation is required to provide the same info keys/values between

 processes?</span></li></ul>

<hr style="display: inline-block; width: 98%;">

<span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"><b>From:</b> Guillaume Mercier<br>

<b>Sent:</b> Thursday, November 16, 2023 3:02 AM<br>

<b>To:</b> Jeff Squyres (jsquyres)<br>

<b>Cc:</b> mpi-forum@lists.mpi-forum.org<br>

<b>Subject:</b> Re: [Mpi-forum] Question about MPI-4.1's MPI_Get_hw_resources_info()

</span>

<div><span style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"><br>

</span></div>

<div><span style="font-size: 11pt;">Hi Jeff,<br>

<br>

Let me revise my first answer and be more specific on a couple<br>

of points you raise in your message.<br>

<br>

Remember that until MPI 4.1, there was no standard way to<br>

provide a value to the "mpi_hw_resource_type" info key that can guide<br>

the splitting of communicators on hardware-basis<br>

(i.e. with a call to MPI_comm_split_type with  MPI_COMM_TYPE_HW_GUIDED<br>

as the input split_type value). MPI_Get_hw_resource_info fills<br>

this gap and makes applications  that rely on this mechanism more<br>

portable than previously.<br>

<br>

On 16/11/2023 03:09, Jeff Squyres (jsquyres) via mpi-forum wrote:<br>

<br>

>      2. For example, my company makes a piece of hardware that can have<br>

>         thousands of virtual NICs on it, and those virtual NICs might<br>

>         even migrate around to different pieces of hardware (e.g., they<br>

>         can migrate between different fiber optic outputs on the same<br>

>         NIC).  MPI processes are assigned to a virtual NIC, not a<br>

>         hardware NIC.  Am I allowed to include a reference to these<br>

>         virtual NICs in the keys/values that are returned (since the<br>

>         Linux device name refers to a virtual entity, not necessarily a<br>

>         specific set of hardware)?  If so, how do I determine the<br>

>         true/false value to assign?<br>

<br>

On second thoughts, since these virtual NICs are software "instances"<br>

(for the lack of a better word), I'm not sure that they should be<br>

listed as keys in the resulting MPI_Info object. I'd like to discuss<br>

this more with you.<br>

<br>

>      3. The text states that the info keys/values are specific to the<br>

>         point of time when the call is made.  p446:11-12 even explicitly<br>

>         states that the process and/or its hardware restrictions may<br>

>         change over time.  So even if I grokked what "restricted to a<br>

>         single instance of a hardware resource of that type" is intended<br>

>         to mean, if things can change -- and they can -- what is the<br>

>         point of giving a true or false value to the user?<br>

<br>

Things can change, but not systematically. I don't think that current<br>

applications modify the binding of their MPI process that often.<br>

Therefore, in the majority of cases, the information you get after<br>

the first call to the procedure is likely to remain valid until<br>

the application's end.<br>

<br>

<br>

>      4.<br>

>         Is the intent that keys will include a specific, unique<br>

>         reference to an instance of "hardware" (e.g., a PCI address)?<br>

>         If so, then the value of "true" and "false" becomes even more<br>

>         nebulous (or meaningless).  E.g., if I list a key containing<br>

>         "cisco-nic-12bc83fde9" to indicate a specific NIC, what is the<br>

>         exact "hardware resource of that type", and/or how would an<br>

>         application know that "cisco-nic-12bc83fde9" and<br>

>         "cisco-nic-bbbbbbbbb" are of the same "hardware resource type"?<br>

<br>

In this case, both guillaume://cisco-nic set to "true" AND<br>

jeffS://cisco-nic-12bc83fde9 set to "true" seem acceptable to me.<br>

It befalls the user to pick a provider and thus to consider which<br>

information should be effectively used.<br>

MPI_Get_hw_resource_info "only" fills the gap<br>

between the application and the lower-level mechanims that can be used<br>

to retrieve this kind of information without resorting to call this<br>

lower-level mechanims directly in the application.<br>

<br>

>      5. I can imagine that there could be many different scenarios here;<br>

>         can someone provide some guidance on what exactly an<br>

>         implementation is supposed to do here?  This text seems to be...<br>

>         ambiguous.<br>

<br>

What you call ambiguous, I would call flexible ;)<br>

But joking aside, the text can surely be improved and I'd more<br>

than happy to take your input into account and come up with an<br>

even better version for MPI 4.2 or 5.0.<br>

<br>

>  2. The AtoI in p445:42-46 says that we should use URIs with a type of<br>

>     "openmpi://" or "hwloc://" or "pmix://" or "openmpi://" or<br>

>     "slurm://" or ...<br>

>      1. All of these are software models (although hwloc's data refers<br>

>         to either hardware or to software devices that correspond to<br>

>         some form of hardware -- although that's not always clear, either).<br>

<br>

The provider only indicate where the information comes from, as two<br>

different sources might report slightly different things. I'll take your<br>

previous "cisco-nic "example: hwloc might choose to report only a<br>

"cisco-nic" type while Cisco's tool might report more precise<br>

information. I think it would be detrimental to the user to not report<br>

all possible informations. Then about software models, I would surely<br>

qualify "openmp" as a software model but not the others.<br>

<br>

<br>

>      2. The use of software models in the text is confusing, because the<br>

>         routine has "hw" in its name, strongly implying that there's<br>

>         supposed to be a direct tie-in to hardware.<br>

<br>

Software models are only used as potential providers, nothing more.<br>

Fundamentally  I don't see the difference with what hwloc does<br>

(information reporting) and what this function does. Or maybe I didn't<br>

understand your comment right?<br>

<br>

Cheers,<br>

Guillaume<br>

</span></div>

</body>

</html>