<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">For those interested, I have opened a discussion on the PMIx Standard regarding how to describe resources in allocation-related requests: <a href="https://github.com/pmix/pmix-standard/issues/386" class="">https://github.com/pmix/pmix-standard/issues/386</a><div class=""><br class=""></div><div class="">Please feel free to chime in!</div><div class="">Ralph</div><div class=""><br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Jan 17, 2022, at 11:36 AM, Ralph Castain via mpiwg-sessions <<a href="mailto:mpiwg-sessions@lists.mpi-forum.org" class="">mpiwg-sessions@lists.mpi-forum.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html; charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">The difficulty lies in translating between the MPI concepts and something a scheduler/RM can understand. We don't schedule processes - we schedule resources and then allocate them to someone for their use. Thus, we wouldn't allocate "50 processes" to someone - we would allocate some number of processors, GPUs, memory, etc, and that someone would then do whatever they want with the allocated resources. Otherwise, we wind up with an MPI-centric scheduler, and that isn't viable - too many other programming paradigms out there.<div class=""><br class=""></div><div class="">Part of the problem we have had is that some portion of the community always associates a process with a physical "core" - I run into that all the time in Cray-land. However, the two bear no relation to each other except for what the programmer creates. Stepping aside from MPI, it isn't unusual for applications to run more processes than cores on a system - and we are increasingly seeing that in MPI as well.</div><div class=""><br class=""></div><div class="">Your Topologies WG might want to look at the PMIx locality info. We already tell you what node (and sub-node) location each process is on (e.g., distance from each NIC and GPU on the node), and that then extends upwards. In particular, we provide you with network-based location info (e.g., these NICs are all attached to the same switch, this switch is connected to that switch) to support hierarchical operations.</div><div class=""><br class=""></div><div class="">As for resource requests, PMIx has already defined that interface - PMIx_Allocate_request. You can ask for resources, request an extension on time for existing resources, or return resources via that interface. This is indeed a two-step process - once you have the resources, you are free to do with them whatever you like. You can spawn new processes on them (PMIx_Spawn), migrate existing processes to them, or whatever.</div><div class=""><br class=""></div><div class="">If you look at MPI_Comm_spawn under Open MPI, you'll find that all it does is translate the MPI arguments into the appropriate PMIx structures and call PMIx_Spawn. I suspect you'll want MPI to do something similar for the "allocate" request (just so you can define your own data structures) - or users can just call the PMIx function directly if they like. We currently see the latter happening fairly routinely as library APIs haven't kept up with the PMIx capabilities, and users want to utilize them.</div><div class=""><br class=""></div><div class="">HTH</div><div class="">Ralph</div><div class=""><br class=""></div><div class=""><br class=""><div class=""><br class=""><blockquote type="cite" class=""><div class="">On Jan 17, 2022, at 8:42 AM, Dan Holmes <<a href="mailto:danholmes@chi.scot" class="">danholmes@chi.scot</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html; charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Hi Ralph,<div class=""><br class=""></div><div class="">You are correct — the MPI community does treat MPI Process as the lowest level concept that is most similar to a HW/SW resource. This is, I believe, the primary reason that the FT efforts focus on the process-fail-stop fault model (by which is meant MPI process fail-stop). It also results in conceptual challenges when using MPI virtual topologies and trying to map a problem domain to a HW resource allocation using an abstract non-resource (MPI process) in a way that maximises the usage efficiency of the HW (such as memory and network links) in the resource allocation.</div><div class=""><br class=""></div><div class="">In the Topologies WG, we are figuring out that, by things like a pset named “<a href="hwloc://rack7" class="">hwloc://rack7</a>”, MPI means “a set containing all the MPI processes that are supported by OS processes that have access to execute on hardware in (some part of) rack 7” rather than “rack 7” itself. We can do a reverse mapping (necessarily imperfect) from an arbitrary set of MPI processes, via the OS processes that support those, to the set of cores that those OS processes are bound to by affinity settings and node boundaries, which gives us an idea of the HW/SW resources used by that set of MPI processes.</div><div class=""><br class=""></div><div class="">This conceptual difference is, perhaps, one of the (dis)advantages of using MPI in an application rather than using PMIx directly.</div><div class=""><br class=""></div><div class="">It is also one of the disappointments with the current MPI_COMM_SPAWN functionality in MPI — that purports to create new MPI processes but, by default, says nothing about where they might be located and/or whether additional resources will be allocated/assigned. A long time ago, in the Sessions WG, we looked at specifying a new API such as MPI_EXEC (and MPI_EXEC_MULTIPLE), which would create new MPI processes (and a pset name for them) without also creating a new intercommunicator. However, that suffered from the same conceptual absence — no reference to HW/SW resources. This is a gap that a dynamic user application must fill somehow.</div><div class=""><br class=""></div><div class="">Should the “may I have another 10 nodes?” question/negotiation be part of MPI (inclusive-)or part of something else (from the application’s point-of-view)? That functionality must exist somewhere for dynamic allocations to have any chance. It sounds like you’re subsuming that API into PMIx/PRRTE (even if the functionality is then delegated to the RM). Should MPI follow suit and present an API that delegates to PMIx? I think yes, otherwise users need to use PMIx (to get a new resource allocation) and MPI (to create MPI processes there). I can foresee all sorts of issues with that plan! Should the API in MPI be a copy&paste from the PMIx one, or are there some useful differences?</div><div class=""><br class=""></div><div class=""><div class=""><div dir="auto" style="caret-color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class="">Cheers,</div><div style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class="">Dan.</div><div style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class="">—</div><div style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class="">Dr Daniel Holmes PhD</div>Executive Director<br class="">Chief Technology Officer<br class=""><div style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class="">CHI Ltd</div><div style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><a href="mailto:danholmes@chi.scot" class="">danholmes@chi.scot</a></div><div style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><br class=""></div></div><br class="Apple-interchange-newline">
</div>
<div class=""><br class=""><blockquote type="cite" class=""><div class="">On 17 Jan 2022, at 15:58, Ralph Castain via mpiwg-sessions <<a href="mailto:mpiwg-sessions@lists.mpi-forum.org" class="">mpiwg-sessions@lists.mpi-forum.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html; charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Hi Dan<div class=""><br class=""></div><div class="">I think we are in sync here. PMIx/PRRTE do support both grow/shrink operations, so that should fit. One difference perhaps lies in terminology - we look at "resource" in terms of hardware or software, not processes. So the app/user asks for additional resources (still working on how best to describe the type of resources required) and then can launch as many or few processes on those resources as they like - it is up the app/user how best to utilize what they are given. I've noticed that the MPI discussions are often talking in terms of "processes" as "resources", which is a little confusing to a resource manager person like myself (we typically don't care what the app/user does with the resource).</div><div class=""><br class=""></div><div class="">The main difference in your description perhaps is that I see PRRTE as a "stop-gap" measure. Being retired, I don't have the time/energy to try and make it into a full production scheduler/resource manager - and it would put me in competition with the current environment providers I'm trying to assist.</div><div class=""><br class=""></div><div class="">What I am trying to do is:</div><div class=""><br class=""></div><div class="">(a) ensure that all the infrastructure to make such an environment work is present and functional. PMIx/PRRTE gives me access to both the client (the app/user) and the backend server (the scheduler/RM) so I can test/develop both sides of the interaction. I'm hoping that the efforts of this WG will contribute to that effort.</div><div class=""><br class=""></div><div class="">(b) work with the current environment providers to foster adoption of those methods. Much of this has been hindered by lack of customer demand - doesn't help when so many procurements ask for "MPI-2 without dynamics", but that's the reality right now.</div><div class=""><br class=""></div><div class="">I will play a bit with a scheduler approach I published (Lagrangian Receding Horizon scheduler) some 20 years ago that was a truly "dynamic" one, but that's more for my personal interest as opposed to any intent to push it into production (it provides me with yet another hook I can use for testing the infrastructure). However, if someone wanted to move forward the way you describe, I have no problem with it nor do I see anything in PRRTE that would preclude such an effort. I have seen at least one current vendor going a somewhat similar way, launching PRRTE as part of their allocation setup (i.e., when an allocation is created, they immediately start a PRRTE shim inside it to support PMIx-enabled applications).</div><div class=""><br class=""></div><div class="">It's a start!</div><div class=""><br class=""></div><div class="">Ralph</div><div class=""><br class=""><div class=""><br class=""><blockquote type="cite" class=""><div class="">On Jan 17, 2022, at 7:29 AM, Dan Holmes <<a href="mailto:danholmes@chi.scot" class="">danholmes@chi.scot</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html; charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Hi Ralph,<div class=""><br class=""></div><div class="">I agree with your suggestion to keep the prototype/temporary code encapsulated somewhere behind and interface that will not need to change as functionality added in the components where it should have been in the first place. My APP/MPI separation hides this intent — there are several components to “MPI” including PMIx, which I completely elided from my sketch.</div><div class=""><br class=""></div><div class="">Ultimately, I see the scheduler as a PMIx/PPRTE thing that doesn’t just reserve a few extra processes to use as a dynamic pool, but goes further and reserves the entire machine to use as a dynamic pool. This is kind of what it does already (in concept, if not in practice), IMHO.</div><div class=""><br class=""></div><div class="">I see a scheduler as an application that doesn’t do useful work with the processes it is allocated — its goal is to give those processes away to other applications, whilst obeying restrictions like priority and batch queue ordering. All the processes it currently owns are, conceptually, in the dynamic pool.</div><div class=""><br class=""></div><div class="">So far, I’ve sketched growing an application, but that is at the expense of dynamic pool or scheduler. The other side of that coin is a shrink of the dynamic pool or scheduler. We will need a mechanism to give back a proper subset (of any size) of the processes currently allocated/accessible. The schedule will reserve all 1024 processes, then give 128 to one application and 256 to another application, etc.</div><div class=""><br class=""></div><div class="">Seeing both sides of this during an application growth transition encourages us to see both sides during an application shrink transition, i.e. to realise that it also is a scheduler growth transition. This is just double-entry bookkeeping, to use an accounting concept.</div><div class=""><br class=""></div><div class="">The PMIx/PRRTE code you are planning to write, which manages a dynamic pool, is a scheduler — it is the first of a class of schedulers that can support dynamic allocations properly. Personally, I would separate the "manage a pool” portion (scheduler application, uses PMIx to interact with other/user applications) from the “give some of mine/take some of theirs” portion (infrastructure/messaging/interaction functionality, belongs inside PMIx). Envision activating your dynamic pool/scheduler application at machine boot time with the instruction to reserve the entire machine and run until the machine is shutdown.</div><div class=""><br class=""></div><div class="">I look forward to seeing the outcome!<br class=""><div class="">
<meta charset="UTF-8" class=""><div dir="auto" style="caret-color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><br class="Apple-interchange-newline">Cheers,</div><div style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class="">Dan.</div><div style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class="">—</div><div style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class="">Dr Daniel Holmes PhD</div>Executive Director<br class="">Chief Technology Officer<br class=""><div style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class="">CHI Ltd</div><div style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><a href="mailto:danholmes@chi.scot" class="">danholmes@chi.scot</a></div><div style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><br class=""></div></div><br class="Apple-interchange-newline">
</div>
<div class=""><br class=""><blockquote type="cite" class=""><div class="">On 17 Jan 2022, at 14:50, Ralph Castain via mpiwg-sessions <<a href="mailto:mpiwg-sessions@lists.mpi-forum.org" class="">mpiwg-sessions@lists.mpi-forum.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html; charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Hi Dan<div class=""><br class=""></div><div class="">Lack of support for the MPI dynamic operations has indeed been problematic. For the last few years, people have been getting around that by using PRRTE as a "shim" to their native scheduler since it fully supports such operations. Caveat is that you still have to request an initial allocation that is as large as you expect to eventually need - but at least you can utilize the dynamic functions, validate the value they provide, etc. Hope is that the community can use those results to apply pressure to the HPC scheduler community to adapt their systems. If they don't....well, some of us are working from the other end (starting with flexible schedulers like Kubernetes) to teach those systems how to support HPC, so maybe we'll meet in the middle :-)</div><div class=""><br class=""></div><div class="">As for this project, I'd recommend going the PRRTE route until we get dynamic scheduling support in the main system. Reasoning is that:</div><div class=""><br class=""></div><div class="">(a) we can "hide" the mechanics for getting more resources from a particular scheduler in the shim, thus allowing the result to be somewhat more portable.</div><div class=""><br class=""></div><div class="">(b) the code in PMIx/PRRTE for getting the resources can remain in the library as the host environment adapts, so the app/library doesn't have to change once the environments do start to provide dynamic support</div><div class=""><br class=""></div><div class="">(c) there is a near-term need to support dynamic programming models (workflows, ML, etc) on HPC systems, and many of those people are using PRRTE as a shim so they can utilize the dynamic APIs in their respective models</div><div class=""><br class=""></div><div class="">As I mentioned in my prior note, I am already working on adding this "shim" support to PRRTE. I had to complete a prior commitment, but that is done now and I can get back to this effort. My hope is that I'll have something ready in the next few weeks. First stage is to have PRRTE "reserve" some of the original allocation for a "dynamic pool" that it will manage to meet resource requests from the apps (not really trying to "schedule" anything - just a "first come, first served" method). This will simply be a means of testing/demonstrating the functionality, but doesn't provide a truly dynamic environment.</div><div class=""><br class=""></div><div class="">My longer-term plan is to have PRRTE start with some initial allocation, and then as apps adjust their resource needs via PMIx calls, PRRTE will request new allocations and stitch them together transparently to the applications (or offer them as disjoint sets of resources, depending upon the request), return allocations that are no longer required, etc. I think that should be available by summer, at least in one or two environments.</div><div class=""><br class=""></div><div class="">Hopefully, getting real dynamic schedulers is only a few years away - but this should help bridge the gap.</div><div class=""><br class=""></div><div class="">HTH</div><div class="">Ralph</div><div class=""><br class=""><div class=""><br class=""><blockquote type="cite" class=""><div class="">On Jan 17, 2022, at 5:40 AM, Dan Holmes <<a href="mailto:danholmes@chi.scot" class="">danholmes@chi.scot</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html; charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Hi Ralph,<div class=""><br class=""></div><div class="">Thanks for this informative peek into the possibilities of PMIx support for these new features in MPI. I’ve finally had a chance to sit and read it properly and inwardly digest it.</div><div class=""><br class=""></div><div class="">This is actually more than I was hoping for, in terms of existing work to support dynamic sessions!</div><div class=""><br class=""></div><div class="">I was fearing that no HPC scheduler would support anything like dynamic allocations because that has been my experience so far on every single supercomputer I’ve had an opportunity to use. In general, the scheduler refuses to implement even the dynamic model that is already in MPI — MPI_COMM_SPAWN[_MULTIPLE]. In many cases, I’ve seen critical bug(s) in the MPI_COMM_CONNECT/MPI_COMM_ACCEPT mechanism that prevents its use. As for MPI_COMM_JOIN: forget it entirely!</div><div class=""><br class=""></div><div class="">My plan for a first-cut proof-of-concept implementation was going to be adding something to the batch queue, testing for when that completely separate job runs, and responding to the down-calls from the processes in that new job with information about the processes in the existing job that asked for additional processes. This will have ridiculous bad latency (time from request for more resources until new resources are available to use) but it would seem to be a viable implementation route to demonstrate the functionality.</div><div class=""><br class=""></div>APP 1: “am I first?” <- MPI 1: “yes, because envar peer_app is not set”<div class=""><br class=""><div class=""><div dir="auto" style="caret-color: rgb(0, 0, 0); word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""></div></div><div class="">APP 1: “can I have 8 more processes?” <- MPI 1: "let me check”</div><div class="">MPI 1: “enqueue batch job (set envar peer_app=APP1)” -> scheduler: “enqueue successful”</div><div class=""><br class=""></div><div class="">APP 2: “am I first?” <- MPI 2: “no, because envar peer_app=APP1”</div><div class="">APP 2: “list pset names” <- MPI 2: “<a href="mpi://world" class="">mpi://world</a>, <a href="mpi://self" class="">mpi://self</a>, <a href="app1://world" class="">app1://world</a>, <a href="app1://self" class="">app1://self</a>” <- MPI 1: <span style="caret-color: rgb(0, 0, 0);" class="">“</span><a href="mpi://world" class="">mpi://world</a><span style="caret-color: rgb(0, 0, 0);" class="">, </span><a href="mpi://self" class="">mpi://self</a>”</div><div class=""><br class=""></div><div class="">At that point, both applications can be notified of a resource change and can (hopefully) use each others resources.</div><div class=""><br class=""></div><div class="">In this way, I think the scheduler does not *have* to be aware of what is going on. It might react faster/more favourably if it was aware.</div><div class=""><br class=""></div><div class="">Does that sketch have any obvious fatal flaws?<br class=""><div class="">
<meta charset="UTF-8" class=""><div dir="auto" style="caret-color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><br class="Apple-interchange-newline">Cheers,</div><div style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class="">Dan.</div><div style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class="">—</div><div style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class="">Dr Daniel Holmes PhD</div>Executive Director<br class="">Chief Technology Officer<br class=""><div style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class="">CHI Ltd</div><div style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><a href="mailto:danholmes@chi.scot" class="">danholmes@chi.scot</a></div><div style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><br class=""></div></div><br class="Apple-interchange-newline">
</div>
<div class=""><br class=""><blockquote type="cite" class=""><div class="">On 3 Jan 2022, at 19:44, Ralph Castain via mpiwg-sessions <<a href="mailto:mpiwg-sessions@lists.mpi-forum.org" class="">mpiwg-sessions@lists.mpi-forum.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html; charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Hello folks<div class=""><br class=""></div><div class="">I had a chance over the holidays to catch up on your docs regarding dynamic sessions - quite interesting. I believe there is support in PMIx for pretty much everything I saw being discussed. We have APIs by which an application can directly request allocation changes from the RM, and events by which the RM can notify (and subsequently negotiate) an application regarding changes to its allocation. So each side has the ability to initiate the process, and then both sides negotiate to a common conclusion. We also have an API by which an application can "register" its willingness to accept RM-initiated "preemption" requests so the RM can incorporate that willingness in its pricing and planning procedures.</div><div class=""><br class=""></div><div class="">Unfortunately, while we have that infrastructure defined in the PMIx Standard and implemented in OpenPMIx, we have not yet seen the required backend support implemented in an RM. I have started working with some folks on integrating support into Slurm, but I do not know the timetable for public release of that work. SchedMD has been ambivalent towards accepting pull requests that extend its PMIx support, so this may well have to be released as a side-project.</div><div class=""><br class=""></div><div class="">I have previously approached Altair about adding support to PBS - nothing has happened yet. I suspect they are waiting for customer demand. I have no knowledge of any other RMs looking into it. As a gap-filling measure, I am adding simulated support in PRRTE so that anyone wanting to develop dynamic resource code can at least have a place where they can develop it and do a little testing. PRRTE doesn't include a scheduler, but I can simulate it by retaining some of the RM-allocated resources as part of a PRRTE-managed "pool".</div><div class=""><br class=""></div><div class="">Meantime, I have started a little personal project to add PMIx support to Kubernetes, hopefully giving it more capability to support HPC applications. The Kubeflow community has a degree of PMIx support, but I want to directly integrate it to Kubernetes itself, including the dynamic resource elements described above. I have no timetable for completing that work - as many of you may know, I am retired and so this is something to do in my spare time. If anyone is interested on tracking progress on this, please let me know.</div><div class=""><br class=""></div><div class="">Thus, I would encourage you to start prodding your favorite RM vendors as this may prove the critical timeline in making dynamic sessions a reality!</div><div class=""><br class=""></div><div class="">Also, if you identify any "gaps" in the PMIx support, please do let me know - I'd be happy to work with you to fill them. The current definitions were developed primarily to support workflow operations and the needs of the dynamic programming model communities (e.g., TensorFlow and Data Analytics). I think those are very similar to what you are identifying, but may perhaps need some tweaking.</div><div class=""><br class=""></div><div class="">Ralph</div><div class=""><br class=""><div class=""><br class=""><blockquote type="cite" class=""><div class="">On Jan 3, 2022, at 9:28 AM, Pritchard Jr., Howard via mpiwg-sessions <<a href="mailto:mpiwg-sessions@lists.mpi-forum.org" class="">mpiwg-sessions@lists.mpi-forum.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta charset="UTF-8" class=""><div class="WordSection1" style="page: WordSection1; caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"><div style="margin: 0in; font-size: 12pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 11pt;" class="">Hello All,<o:p class=""></o:p></span></div><div style="margin: 0in; font-size: 12pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 11pt;" class=""><o:p class=""> </o:p></span></div><div style="margin: 0in; font-size: 12pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 11pt;" class="">Happy New Year!<o:p class=""></o:p></span></div><div style="margin: 0in; font-size: 12pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 11pt;" class=""><o:p class=""> </o:p></span></div><div style="margin: 0in; font-size: 12pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 11pt;" class="">Let’s try to meet today. Items on the agenda:<o:p class=""></o:p></span></div><div style="margin: 0in; font-size: 12pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 11pt;" class=""><o:p class=""> </o:p></span></div><ul type="disc" style="margin-bottom: 0in; margin-top: 0in;" class=""><li class="MsoListParagraph" style="margin: 0in; font-size: 12pt; font-family: Calibri, sans-serif;"><span style="font-size: 13.5pt; font-family: -webkit-standard;" class="">PR #629 (issue #511) -<span class="Apple-converted-space"> </span><a href="https://github.com/mpi-forum/mpi-issues/issues/511" style="color: rgb(5, 99, 193); text-decoration: underline;" class="">https://github.com/mpi-forum/mpi-issues/issues/511</a></span><span style="font-size: 11pt;" class=""><o:p class=""></o:p></span></li></ul><div style="margin: 0in 0in 0in 0.25in; font-size: 12pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 11pt;" class=""><o:p class=""> </o:p></span></div><ul type="disc" style="margin-bottom: 0in; margin-top: 0in;" class=""><li class="MsoListParagraph" style="margin: 0in; font-size: 12pt; font-family: Calibri, sans-serif;"><span style="font-size: 13.5pt; font-family: -webkit-standard;" class="">Pick up where we were on discussion of dynamic sessions requirements, see:</span><span style="font-size: 11pt;" class=""><o:p class=""></o:p></span><ul type="circle" style="margin-bottom: 0in; margin-top: 0in;" class=""><li class="MsoListParagraph" style="margin: 0in; font-size: 12pt; font-family: Calibri, sans-serif;"><span style="font-size: 11pt;" class=""><a href="https://miro.com/app/board/o9J_l_Rxe9Q=/" style="color: rgb(5, 99, 193); text-decoration: underline;" class="">https://miro.com/app/board/o9J_l_Rxe9Q=/</a><o:p class=""></o:p></span></li><li class="MsoListParagraph" style="margin: 0in; font-size: 12pt; font-family: Calibri, sans-serif;"><span style="font-size: 11pt;" class=""><a href="https://docs.google.com/document/d/1l7LQ8eeVOUW69TDVG9LjKJUuerfE3S3teaMFG5DOudM/edit#heading=h.voobxhw94rt3" style="color: rgb(5, 99, 193); text-decoration: underline;" class="">https://docs.google.com/document/d/1l7LQ8eeVOUW69TDVG9LjKJUuerfE3S3teaMFG5DOudM/edit#heading=h.voobxhw94rt3</a><o:p class=""></o:p></span></li><li class="MsoListParagraph" style="margin: 0in; font-size: 12pt; font-family: Calibri, sans-serif;"><span style="font-size: 11pt;" class=""><o:p class=""> </o:p></span></li></ul></li><li class="MsoListParagraph" style="margin: 0in; font-size: 12pt; font-family: Calibri, sans-serif;"><span style="font-size: 11pt;" class="">If my calendar calculation is right, we will be meeting with the FT WG today<o:p class=""></o:p></span></li></ul><div style="margin: 0in; font-size: 12pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 11pt;" class=""><o:p class=""> </o:p></span></div><div style="margin: 0in; font-size: 12pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 11pt;" class="">Thanks,<o:p class=""></o:p></span></div><div style="margin: 0in; font-size: 12pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 11pt;" class=""><o:p class=""> </o:p></span></div><div style="margin: 0in; font-size: 12pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 11pt;" class="">Howard<o:p class=""></o:p></span></div><div class=""><div class=""><div style="margin: 0in; font-size: 12pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 11pt; font-family: Arial, sans-serif; color: rgb(11, 26, 141);" class=""><br class="">—</span><span style="font-size: 11pt;" class=""><o:p class=""></o:p></span></div></div><div class=""><div style="margin: 0in; font-size: 12pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 11pt; font-family: Arial, sans-serif; color: rgb(11, 26, 141);" class=""><o:p class=""> </o:p></span></div><table class="MsoNormalTable" border="0" cellspacing="0" cellpadding="0" width="375" style="width: 281.25pt; border-collapse: collapse;"><tbody class=""><tr style="height: 104.45pt;" class=""><td width="78" valign="top" style="width: 58.4pt; padding: 0in 5.4pt; height: 104.45pt;" class=""><div style="margin: 0in; font-size: 12pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 9pt; font-family: Arial, sans-serif;" class=""><span id="cid:image001.png@01D8008C.AD1D0010" class=""><image001.png></span><o:p class=""></o:p></span></div></td><td width="297" valign="top" style="width: 223.1pt; padding: 0in 5.4pt; height: 104.45pt;" class=""><div style="margin: 0in; font-size: 12pt; font-family: Calibri, sans-serif;" class=""><b class=""><span style="font-family: Arial, sans-serif; color: rgb(11, 26, 140);" class="">Howard Pritchard<o:p class=""></o:p></span></b></div><div style="margin: 0in; font-size: 12pt; font-family: Calibri, sans-serif;" class=""><b class=""><span style="font-size: 10pt; font-family: Arial, sans-serif; color: rgb(84, 89, 97);" class="">Research Scientist<o:p class=""></o:p></span></b></div><div style="margin: 0in; font-size: 12pt; font-family: Calibri, sans-serif;" class=""><b class=""><span style="font-size: 10pt; font-family: Arial, sans-serif; color: rgb(84, 89, 97);" class="">HPC-ENV<o:p class=""></o:p></span></b></div><div style="margin: 0in; font-size: 12pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 9pt; font-family: Arial, sans-serif;" class=""><o:p class=""> </o:p></span></div><div style="margin: 0in; font-size: 12pt; font-family: Calibri, sans-serif;" class=""><b class=""><span style="font-size: 9pt; font-family: Arial, sans-serif; color: rgb(11, 26, 140);" class="">Los Alamos National Laboratory<o:p class=""></o:p></span></b></div><div style="margin: 0in; font-size: 12pt; font-family: Calibri, sans-serif;" class=""><b class=""><span style="font-size: 9pt; font-family: Arial, sans-serif; color: rgb(11, 26, 140);" class=""><a href="mailto:howardp@lanl.gov" style="color: rgb(5, 99, 193); text-decoration: underline;" class="">howardp@lanl.gov</a><o:p class=""></o:p></span></b></div><div style="margin: 0in; font-size: 12pt; font-family: Calibri, sans-serif;" class=""><b class=""><span style="font-size: 9pt; font-family: Arial, sans-serif; color: rgb(11, 26, 140);" class=""><o:p class=""> </o:p></span></b></div><div style="margin: 0in; font-size: 12pt; font-family: Calibri, sans-serif;" class=""><a href="https://www.instagram.com/losalamosnatlab/" style="color: rgb(5, 99, 193); text-decoration: underline;" class=""><b class=""><span style="font-size: 9pt; font-family: Arial, sans-serif; color: rgb(11, 26, 140); text-decoration: none;" class=""><span id="cid:image002.png@01D8008C.AD1D0010" class=""><image002.png></span></span></b></a><a href="https://twitter.com/LosAlamosNatLab" style="color: rgb(5, 99, 193); text-decoration: underline;" class=""><b class=""><span style="font-size: 9pt; font-family: Arial, sans-serif; color: rgb(11, 26, 140); text-decoration: none;" class=""><span id="cid:image003.png@01D8008C.AD1D0010" class=""><image003.png></span></span></b></a><a href="https://www.linkedin.com/company/los-alamos-national-laboratory/" style="color: rgb(5, 99, 193); text-decoration: underline;" class=""><b class=""><span style="font-size: 9pt; font-family: Arial, sans-serif; color: rgb(11, 26, 140); text-decoration: none;" class=""><span id="cid:image004.png@01D8008C.AD1D0010" class=""><image004.png></span></span></b></a><a href="https://www.facebook.com/LosAlamosNationalLab/" style="color: rgb(5, 99, 193); text-decoration: underline;" class=""><b class=""><span style="font-size: 9pt; font-family: Arial, sans-serif; color: rgb(11, 26, 140); text-decoration: none;" class=""><span id="cid:image005.png@01D8008C.AD1D0010" class=""><image005.png></span></span></b></a><b class=""><span style="font-size: 9pt; font-family: Arial, sans-serif; color: rgb(11, 26, 140);" class=""><o:p class=""></o:p></span></b></div></td></tr></tbody></table><div style="margin: 0in; font-size: 12pt; font-family: Calibri, sans-serif;" class=""><span style="font-size: 11pt; font-family: Arial, sans-serif;" class=""><o:p class=""> </o:p></span></div></div></div><div style="margin: 0in; font-size: 12pt; font-family: Calibri, sans-serif;" class=""><o:p class=""> </o:p></div></div><span style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">_______________________________________________</span><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><span style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">mpiwg-sessions mailing list</span><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><a href="mailto:mpiwg-sessions@lists.mpi-forum.org" style="color: rgb(5, 99, 193); text-decoration: underline; font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;" class="">mpiwg-sessions@lists.mpi-forum.org</a><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><a href="https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions" style="color: rgb(5, 99, 193); text-decoration: underline; font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;" class="">https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions</a></div></blockquote></div><br class=""></div></div>_______________________________________________<br class="">mpiwg-sessions mailing list<br class=""><a href="mailto:mpiwg-sessions@lists.mpi-forum.org" class="">mpiwg-sessions@lists.mpi-forum.org</a><br class=""><a href="https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions" class="">https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions</a><br class=""></div></blockquote></div><br class=""></div></div></div></div></blockquote></div><br class=""></div></div>_______________________________________________<br class="">mpiwg-sessions mailing list<br class=""><a href="mailto:mpiwg-sessions@lists.mpi-forum.org" class="">mpiwg-sessions@lists.mpi-forum.org</a><br class=""><a href="https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions" class="">https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions</a><br class=""></div></blockquote></div><br class=""></div></div></div></blockquote></div><br class=""></div></div>_______________________________________________<br class="">mpiwg-sessions mailing list<br class=""><a href="mailto:mpiwg-sessions@lists.mpi-forum.org" class="">mpiwg-sessions@lists.mpi-forum.org</a><br class=""><a href="https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions" class="">https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions</a><br class=""></div></blockquote></div><br class=""></div></div></div></blockquote></div><br class=""></div></div>_______________________________________________<br class="">mpiwg-sessions mailing list<br class=""><a href="mailto:mpiwg-sessions@lists.mpi-forum.org" class="">mpiwg-sessions@lists.mpi-forum.org</a><br class="">https://lists.mpi-forum.org/mailman/listinfo/mpiwg-sessions<br class=""></div></blockquote></div><br class=""></div></body></html>