[mpiwg-rma] MPI_WIN_CREATE for intercommunicators
Jeff Hammond
jeff.science at gmail.com
Wed Jan 22 15:28:45 CST 2014
The other way to solve your issue is to use two windows. I'll write
it up in detail later.
In any case, I'm not totally opposed to your use case. I just don't
know about the cost-benefit relative to many other features.
Jeff
On Wed, Jan 22, 2014 at 4:48 AM, Thomas Jahns <jahns at dkrz.de> wrote:
> Hello Jeff,
>
> On 01/21/14 20:03, Jeff Hammond wrote:
>> It makes my brain hurt just to think about this.
>
> I certainly intended no harm, however mild ;-).
>
>> First, one should recognize that RMA communication occurs using the
>> window object, not the communicator as in p2p and collectives. The
>> reason for a comm object in WIN_CREATE et al. is to enable MPI to
>> communicate between participating processes in order to create the
>> window. For example, many implementations will do an ALLGATHER inside
>> of WIN_CREATE. That operation needs a comm object.
>
> So there is no problem here with using an intercommunicator?
>
>> The second purpose of the comm is to generate the group of the window
>> in order to denote which ranks know about the window. It bothers me a
>> negligible amount that the exact definition of the rank arguments for
>> RMA is not defined explicitly, but one can reasonably assume that
>> those are ranks on the group of the window. I suppose one could talk
>> about local and remote groups for a window but again, brain hurt.
>
> I think what's defined in lines 20-25 of page 420 of MPI 3.0 already makes
> perfect sense if comm were an intercommunicator since it's specified in terms of
> what would happen for point-to-point communications. So no problem either.
>
>> I do not believe that the communicator has any purpose beyond this and
>> thus do not know what the consequences of allowing intercomms would
>> be. Perhaps we can use - only for the sake of argument since it is
>> not required to have these semantics - that WIN_CREATE is like
>> ALLGATHER and use the semantics of ALLGATHER on intercomms to define
>> the semantics of win objects created using intercomms.
>
> For e.g. MPI_Win_post one needs a group, based on the communicator originally
> passed to MPI_Win_create but that's already solved for intracommunicators and no
> new problem.
>
>> In any case, I don't see any real value in this, honestly. Efficiency
>> arguments assume implementation details not specified by the standard.
>> Making a promise to not communicate with some ranks has no effect on
>> some implementations. Of course we know ones where it does, but
>> that's not germane.
>
> Still it's something I cannot express currently.
>
>> Honestly, I think any sane implementer will just MPI_COMM_TEST_INTER
>> -> INTERCOMM_MERGE inside of of WIN_CREATE (et al.), do everything as
>> before and then free the temp intracomm at the end. Then I would
>> never, ever think about this issue again until someone paid me a huge
>> sum of money to optimize for the intercomm case. Thus, the result of
>> changing the standard will be almost nothing other than people will be
>> able to avoid doing the following explicitly:
>
> For the above scenario the original communicator needs to be retained, i.e. to
> make use of for a later MPI_Group_translate, but otherwise you are right unless
> mpi_win_create were to be significantly less expensive with less potential paths.
>
>> int MPE_Win_create(void *base, MPI_Aint size, int disp_unit, MPI_Info
>> info, MPI_Comm comm, MPI_Win *win)
>> {
>> int is_intercomm;
>> MPI_Comm intracomm;
>> MPI_Comm_test_inter(comm, &is_intercomm);
>> if (is_intercomm)
>> MPI_Intercomm_merge(comm, 0, &intracomm);
>> else
>> intracomm = comm;
>> MPI_Win_create(base, size, disp_unit, info, intracomm, win);
>> if (is_intercomm)
>> MPI_Comm_free(&intracomm);
>> return MPI_SUCCESS;
>> }
>>
>> In short, I don't see any value in trying to define the perverse
>> semantics of intercomm-based windows to avoid the ~12 lines of code
>> above.
>
> In my case, where I have a client-server setup with point-to-point communication
> happening via an intercommunicator, adding bulk communication via RMA the
> current state of affairs means I have to use an additional communicator and
> translate IDs. But sure, it can be done with something similar to the above,
> it's just more code on the user side.
>
>> In the event we try to act on this proposal, I suppose we use the very
>> limited text regarding send-recv on intercomms as our guide: "A target
>> process is addressed by its rank in the remote group, both for sends
>> and for receives." [pg. 258]
>
> That's what seemed intuitive to me too (see above).
>
> Regards, Thomas
> --
> Thomas Jahns
> DKRZ GmbH, Department: Application software
>
> Deutsches Klimarechenzentrum
> Bundesstraße 45a
> D-20146 Hamburg
>
> Phone: +49-40-460094-151
> Fax: +49-40-460094-270
> Email: Thomas Jahns <jahns at dkrz.de>
>
--
Jeff Hammond
jeff.science at gmail.com
More information about the mpiwg-rma
mailing list