[mpiwg-rma] MPI_WIN_CREATE for intercommunicators
Thomas Jahns
jahns at dkrz.de
Wed Jan 22 04:48:51 CST 2014
Hello Jeff,
On 01/21/14 20:03, Jeff Hammond wrote:
> It makes my brain hurt just to think about this.
I certainly intended no harm, however mild ;-).
> First, one should recognize that RMA communication occurs using the
> window object, not the communicator as in p2p and collectives. The
> reason for a comm object in WIN_CREATE et al. is to enable MPI to
> communicate between participating processes in order to create the
> window. For example, many implementations will do an ALLGATHER inside
> of WIN_CREATE. That operation needs a comm object.
So there is no problem here with using an intercommunicator?
> The second purpose of the comm is to generate the group of the window
> in order to denote which ranks know about the window. It bothers me a
> negligible amount that the exact definition of the rank arguments for
> RMA is not defined explicitly, but one can reasonably assume that
> those are ranks on the group of the window. I suppose one could talk
> about local and remote groups for a window but again, brain hurt.
I think what's defined in lines 20-25 of page 420 of MPI 3.0 already makes
perfect sense if comm were an intercommunicator since it's specified in terms of
what would happen for point-to-point communications. So no problem either.
> I do not believe that the communicator has any purpose beyond this and
> thus do not know what the consequences of allowing intercomms would
> be. Perhaps we can use - only for the sake of argument since it is
> not required to have these semantics - that WIN_CREATE is like
> ALLGATHER and use the semantics of ALLGATHER on intercomms to define
> the semantics of win objects created using intercomms.
For e.g. MPI_Win_post one needs a group, based on the communicator originally
passed to MPI_Win_create but that's already solved for intracommunicators and no
new problem.
> In any case, I don't see any real value in this, honestly. Efficiency
> arguments assume implementation details not specified by the standard.
> Making a promise to not communicate with some ranks has no effect on
> some implementations. Of course we know ones where it does, but
> that's not germane.
Still it's something I cannot express currently.
> Honestly, I think any sane implementer will just MPI_COMM_TEST_INTER
> -> INTERCOMM_MERGE inside of of WIN_CREATE (et al.), do everything as
> before and then free the temp intracomm at the end. Then I would
> never, ever think about this issue again until someone paid me a huge
> sum of money to optimize for the intercomm case. Thus, the result of
> changing the standard will be almost nothing other than people will be
> able to avoid doing the following explicitly:
For the above scenario the original communicator needs to be retained, i.e. to
make use of for a later MPI_Group_translate, but otherwise you are right unless
mpi_win_create were to be significantly less expensive with less potential paths.
> int MPE_Win_create(void *base, MPI_Aint size, int disp_unit, MPI_Info
> info, MPI_Comm comm, MPI_Win *win)
> {
> int is_intercomm;
> MPI_Comm intracomm;
> MPI_Comm_test_inter(comm, &is_intercomm);
> if (is_intercomm)
> MPI_Intercomm_merge(comm, 0, &intracomm);
> else
> intracomm = comm;
> MPI_Win_create(base, size, disp_unit, info, intracomm, win);
> if (is_intercomm)
> MPI_Comm_free(&intracomm);
> return MPI_SUCCESS;
> }
>
> In short, I don't see any value in trying to define the perverse
> semantics of intercomm-based windows to avoid the ~12 lines of code
> above.
In my case, where I have a client-server setup with point-to-point communication
happening via an intercommunicator, adding bulk communication via RMA the
current state of affairs means I have to use an additional communicator and
translate IDs. But sure, it can be done with something similar to the above,
it's just more code on the user side.
> In the event we try to act on this proposal, I suppose we use the very
> limited text regarding send-recv on intercomms as our guide: "A target
> process is addressed by its rank in the remote group, both for sends
> and for receives." [pg. 258]
That's what seemed intuitive to me too (see above).
Regards, Thomas
--
Thomas Jahns
DKRZ GmbH, Department: Application software
Deutsches Klimarechenzentrum
Bundesstraße 45a
D-20146 Hamburg
Phone: +49-40-460094-151
Fax: +49-40-460094-270
Email: Thomas Jahns <jahns at dkrz.de>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4639 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-rma/attachments/20140122/3125cfe1/attachment-0001.bin>
More information about the mpiwg-rma
mailing list