[mpiwg-rma] shared-like access within a node with non-shared windows

Jeff Hammond jeff.science at gmail.com
Fri Oct 18 14:39:24 CDT 2013

Yes, as I said, that's all I can do right now.  But MPI_WIN_CREATE is
not scalable.  And it requires two windows instead of one.

Brian, Pavan and Xin all seem to agree that this is straightforward to
implement as an optional feature.  We just need to figure out how to
extend the use of MPI_WIN_SHARED_QUERY to enable it.


On Fri, Oct 18, 2013 at 2:35 PM, Jim Dinan <james.dinan at gmail.com> wrote:
> Jeff,
> Sorry, I haven't read the whole thread closely, so please ignore me if this
> is nonsense.  Can you get what you want by doing MPI_Win_allocate_shared()
> to create an intranode window, and then pass the buffer allocated by
> MPI_Win_allocate_shared to MPI_Win_create() to create an internode window?
>  ~Jim.
> On Sat, Oct 12, 2013 at 3:49 PM, Jeff Hammond <jeff.science at gmail.com>
> wrote:
>> Pavan told me that (in MPICH) MPI_Win_allocate is way better than
>> MPI_Win_create because the former allocates the shared memory
>> business.  It was implied that the latter requires more work within
>> the node. (I thought mmap could do the same magic on existing
>> allocations, but that's not really the point here.)
>> But within a node, what's even better than a window allocated with
>> MPI_Win_allocate is a window allowed with MPI_Win_allocate_shared,
>> since the latter permits load-store.  Then I wondered if it would be
>> possible to have both (1) direct load-store access within a node and
>> (2) scalable metadata for windows spanning many nodes.
>> I can get (1) but not (2) by using MPI_Win_allocate_shared and then
>> dropping a second window for the internode part on top of these using
>> MPI_Win_create.  Of course, I can get (2) but not (1) using
>> MPI_Win_allocate.
>> I propose that it be possible to get (1) and (2) by allowing
>> MPI_Win_shared_query to return pointers to shared memory within a node
>> When the input argument rank to MPI_Win_shared_query corresponds to
>> memory that is not accessible by load-store, the out arguments size
>> and baseptr are 0 and NULL, respectively.
>> The non-scalable use of this feature would be to loop over all ranks
>> in the group associated with the window and test for baseptr!=NULL,
>> while the scalable use would presumably utilize MPI_Comm_split_type,
>> MPI_Comm_group and MPI_Group_translate_ranks to determine the list of
>> ranks corresponding to the node, hence the ones that might permit
>> direct access.
>> Comments are appreciate.
>> Jeff
>> --
>> Jeff Hammond
>> jeff.science at gmail.com
>> _______________________________________________
>> mpiwg-rma mailing list
>> mpiwg-rma at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> _______________________________________________
> mpiwg-rma mailing list
> mpiwg-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma

Jeff Hammond
jeff.science at gmail.com

More information about the mpiwg-rma mailing list