[mpiwg-sessions] Meet today
rhc at open-mpi.org
Mon Jan 3 13:44:21 CST 2022
I had a chance over the holidays to catch up on your docs regarding dynamic sessions - quite interesting. I believe there is support in PMIx for pretty much everything I saw being discussed. We have APIs by which an application can directly request allocation changes from the RM, and events by which the RM can notify (and subsequently negotiate) an application regarding changes to its allocation. So each side has the ability to initiate the process, and then both sides negotiate to a common conclusion. We also have an API by which an application can "register" its willingness to accept RM-initiated "preemption" requests so the RM can incorporate that willingness in its pricing and planning procedures.
Unfortunately, while we have that infrastructure defined in the PMIx Standard and implemented in OpenPMIx, we have not yet seen the required backend support implemented in an RM. I have started working with some folks on integrating support into Slurm, but I do not know the timetable for public release of that work. SchedMD has been ambivalent towards accepting pull requests that extend its PMIx support, so this may well have to be released as a side-project.
I have previously approached Altair about adding support to PBS - nothing has happened yet. I suspect they are waiting for customer demand. I have no knowledge of any other RMs looking into it. As a gap-filling measure, I am adding simulated support in PRRTE so that anyone wanting to develop dynamic resource code can at least have a place where they can develop it and do a little testing. PRRTE doesn't include a scheduler, but I can simulate it by retaining some of the RM-allocated resources as part of a PRRTE-managed "pool".
Meantime, I have started a little personal project to add PMIx support to Kubernetes, hopefully giving it more capability to support HPC applications. The Kubeflow community has a degree of PMIx support, but I want to directly integrate it to Kubernetes itself, including the dynamic resource elements described above. I have no timetable for completing that work - as many of you may know, I am retired and so this is something to do in my spare time. If anyone is interested on tracking progress on this, please let me know.
Thus, I would encourage you to start prodding your favorite RM vendors as this may prove the critical timeline in making dynamic sessions a reality!
Also, if you identify any "gaps" in the PMIx support, please do let me know - I'd be happy to work with you to fill them. The current definitions were developed primarily to support workflow operations and the needs of the dynamic programming model communities (e.g., TensorFlow and Data Analytics). I think those are very similar to what you are identifying, but may perhaps need some tweaking.
On Jan 3, 2022, at 9:28 AM, Pritchard Jr., Howard via mpiwg-sessions <mpiwg-sessions at lists.mpi-forum.org <mailto:mpiwg-sessions at lists.mpi-forum.org> > wrote:
Happy New Year!
Let’s try to meet today. Items on the agenda:
* PR #629 (issue #511) - https://github.com/mpi-forum/mpi-issues/issues/511 <https://github.com/mpi-forum/mpi-issues/issues/511>
* Pick up where we were on discussion of dynamic sessions requirements, see:
* https://miro.com/app/board/o9J_l_Rxe9Q=/ <https://miro.com/app/board/o9J_l_Rxe9Q=/>
* https://docs.google.com/document/d/1l7LQ8eeVOUW69TDVG9LjKJUuerfE3S3teaMFG5DOudM/edit#heading=h.voobxhw94rt3 <https://docs.google.com/document/d/1l7LQ8eeVOUW69TDVG9LjKJUuerfE3S3teaMFG5DOudM/edit#heading=h.voobxhw94rt3>
* If my calendar calculation is right, we will be meeting with the FT WG today
Los Alamos National Laboratory
howardp at lanl.gov <mailto:howardp at lanl.gov>
<image002.png> <https://www.instagram.com/losalamosnatlab/> <image003.png> <https://twitter.com/LosAlamosNatLab> <image004.png> <https://www.linkedin.com/company/los-alamos-national-laboratory/> <image005.png> <https://www.facebook.com/LosAlamosNationalLab/>
mpiwg-sessions mailing list
mpiwg-sessions at lists.mpi-forum.org <mailto:mpiwg-sessions at lists.mpi-forum.org>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the mpiwg-sessions