[mpiwg-sessions] notes from today's call
rhc at open-mpi.org
rhc at open-mpi.org
Thu Oct 19 12:43:51 CDT 2017
In thinking about this more, I realized that these new attributes won’t solve the problem raised by Martin. Instead, they were focused on giving the application the ability to define the nspace name - doesn’t help with the issue of specifying procs from different clusters in the PMIx_Connect call.
After circling around again with the RM folks, a couple of things were resolved:
1. They really cannot support user-defined nspaces as the nspace directly correlates to their assignment of a “jobid” to the operation. Trying to build correlation tables to map a user definition to the RM’s identifier would be overly burdensome.
2. They are willing to standardize on prepending the cluster ID string to the nspace when referencing remote clusters. We tentatively agreed on using a colon ‘:’ as the delimiter. So when referencing a proc in nspace “bar” on cluster “foo”, you would provide an nspace of “foo:bar”. If no cluster ID is provided, then all parties will assume the nspace refers to the local cluster.
3. For the issue of having multiple, parallel PMIx_Connect operations spanning identical procs, we agreed to define an attribute PMIX_CONNECT_ID (string) whereby the application can provide its own unique “tag” for that operation. When provided, the RM and PMIx libraries will use this tag to separate out the operations. Note that all procs participating in the connect operation must provide the same tag.
I will update the RFC accordingly, and provide a couple of new macros in pmix_common.h to make insertion and parsing of the cluster ID to/from the nspace easier.
> On Oct 18, 2017, at 8:51 PM, rhc at open-mpi.org wrote:
> Hello all
> I followed up on my AR from the meeting to check with the RMs on how they handle unique identifiers for procs on different clusters. As we had surmised, they provide a string name for each cluster, and they agreed that adding that to the PMIx nspace would be a reasonable path forward.
> I have accordingly updated the PMIx RFC (https://github.com/pmix/RFCs/pull/3 <https://github.com/pmix/RFCs/pull/3>) to include three new attributes:
> * PMIX_CONNECT_ID_MODIFIER_PREPEND: modify the nspace returned by the host by prepending the given modifier to the nspace string. This allows the application to "tag" the connected group in a recognizable fashion.
> * PMIX_CONNECT_ID_MODIFIER_APPEND: modify the nspace returned by the host by appending the given modifier to the nspace string. This allows the application to "tag" the connected group in a recognizable fashion.
> * PMIX_CONNECT_ID_REQUEST: request that the given identifier be used as the assigned nspace for the connected group. The "required" flag in the directive can be used to indicate that this identifier is required (i.e., the host RM must use it for the group, returning an error if it is already in use) as opposed to requested (i.e., the host RM can substitute its own unique identifier if the specified one is already in use).
> I also added the PMIX_CLUSTER_ID attribute to the list of information to be provided by the RM at process start - you can see the list here:
> https://github.com/pmix/pmix/wiki/2.8-Pmix-Server-Data-Requirements <https://github.com/pmix/pmix/wiki/2.8-Pmix-Server-Data-Requirements>
>> On Oct 16, 2017, at 11:48 AM, Pritchard Jr., Howard <howardp at lanl.gov <mailto:howardp at lanl.gov>> wrote:
>> Hi Folks,
>> Notes from today’s call are on the wiki:
>> https://github.com/mpiwg-sessions/sessions-issues/wiki/2017-10-16-webex <https://github.com/mpiwg-sessions/sessions-issues/wiki/2017-10-16-webex>
>> Howard Pritchard
>> B Schedule
>> Los Alamos National Laboratory
>> mpiwg-sessions mailing list
>> mpiwg-sessions at lists.mpi-forum.org <mailto:mpiwg-sessions at lists.mpi-forum.org>
> mpiwg-sessions mailing list
> mpiwg-sessions at lists.mpi-forum.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the mpiwg-sessions