[mpiwg-rma] shared-like access within a node with non-shared windows

Jeff Hammond jeff.science at gmail.com
Sat Oct 12 14:49:25 CDT 2013

Pavan told me that (in MPICH) MPI_Win_allocate is way better than
MPI_Win_create because the former allocates the shared memory
business.  It was implied that the latter requires more work within
the node. (I thought mmap could do the same magic on existing
allocations, but that's not really the point here.)

But within a node, what's even better than a window allocated with
MPI_Win_allocate is a window allowed with MPI_Win_allocate_shared,
since the latter permits load-store.  Then I wondered if it would be
possible to have both (1) direct load-store access within a node and
(2) scalable metadata for windows spanning many nodes.

I can get (1) but not (2) by using MPI_Win_allocate_shared and then
dropping a second window for the internode part on top of these using
MPI_Win_create.  Of course, I can get (2) but not (1) using

I propose that it be possible to get (1) and (2) by allowing
MPI_Win_shared_query to return pointers to shared memory within a node
When the input argument rank to MPI_Win_shared_query corresponds to
memory that is not accessible by load-store, the out arguments size
and baseptr are 0 and NULL, respectively.

The non-scalable use of this feature would be to loop over all ranks
in the group associated with the window and test for baseptr!=NULL,
while the scalable use would presumably utilize MPI_Comm_split_type,
MPI_Comm_group and MPI_Group_translate_ranks to determine the list of
ranks corresponding to the node, hence the ones that might permit
direct access.

Comments are appreciate.


