[mpiwg-rma] MPI_WIN_CREATE for intercommunicators

Thomas Jahns jahns at dkrz.de
Wed Jan 22 04:48:51 CST 2014

Hello Jeff,

On 01/21/14 20:03, Jeff Hammond wrote:
> It makes my brain hurt just to think about this.

I certainly intended no harm, however mild ;-).

> First, one should recognize that RMA communication occurs using the
> window object, not the communicator as in p2p and collectives.  The
> reason for a comm object in WIN_CREATE et al. is to enable MPI to
> communicate between participating processes in order to create the
> window.  For example, many implementations will do an ALLGATHER inside
> of WIN_CREATE.  That operation needs a comm object.

So there is no problem here with using an intercommunicator?

> The second purpose of the comm is to generate the group of the window
> in order to denote which ranks know about the window.  It bothers me a
> negligible amount that the exact definition of the rank arguments for
> RMA is not defined explicitly, but one can reasonably assume that
> those are ranks on the group of the window.  I suppose one could talk
> about local and remote groups for a window but again, brain hurt.

I think what's defined in lines 20-25 of page 420 of MPI 3.0 already makes
perfect sense if comm were an intercommunicator since it's specified in terms of
what would happen for point-to-point communications. So no problem either.

> I do not believe that the communicator has any purpose beyond this and
> thus do not know what the consequences of allowing intercomms would
> be.  Perhaps we can use - only for the sake of argument since it is
> not required to have these semantics - that WIN_CREATE is like
> ALLGATHER and use the semantics of ALLGATHER on intercomms to define
> the semantics of win objects created using intercomms.

For e.g. MPI_Win_post one needs a group, based on the communicator originally
passed to MPI_Win_create but that's already solved for intracommunicators and no
new problem.

> In any case, I don't see any real value in this, honestly.  Efficiency
> arguments assume implementation details not specified by the standard.
>  Making a promise to not communicate with some ranks has no effect on
> some implementations.  Of course we know ones where it does, but
> that's not germane.

Still it's something I cannot express currently.

> Honestly, I think any sane implementer will just MPI_COMM_TEST_INTER
> -> INTERCOMM_MERGE inside of of WIN_CREATE (et al.), do everything as
> before and then free the temp intracomm at the end.  Then I would
> never, ever think about this issue again until someone paid me a huge
> sum of money to optimize for the intercomm case.  Thus, the result of
> changing the standard will be almost nothing other than people will be
> able to avoid doing the following explicitly:

For the above scenario the original communicator needs to be retained, i.e. to
make use of for a later MPI_Group_translate, but otherwise you are right unless
mpi_win_create were to be significantly less expensive with less potential paths.

> int MPE_Win_create(void *base, MPI_Aint size, int disp_unit, MPI_Info
> info, MPI_Comm comm, MPI_Win *win)
> {
>   int is_intercomm;
>   MPI_Comm intracomm;
>   MPI_Comm_test_inter(comm, &is_intercomm);
>   if (is_intercomm)
>     MPI_Intercomm_merge(comm, 0, &intracomm);
>   else
>     intracomm = comm;
>   MPI_Win_create(base, size, disp_unit, info, intracomm, win);
>   if (is_intercomm)
>     MPI_Comm_free(&intracomm);
>   return MPI_SUCCESS;
> }
> In short, I don't see any value in trying to define the perverse
> semantics of intercomm-based windows to avoid the ~12 lines of code
> above.

In my case, where I have a client-server setup with point-to-point communication
happening via an intercommunicator, adding bulk communication via RMA the
current state of affairs means I have to use an additional communicator and
translate IDs. But sure, it can be done with something similar to the above,
it's just more code on the user side.

> In the event we try to act on this proposal, I suppose we use the very
> limited text regarding send-recv on intercomms as our guide: "A target
> process is addressed by its rank in the remote group, both for sends
> and for receives." [pg. 258]

That's what seemed intuitive to me too (see above).

Regards, Thomas
Thomas Jahns
DKRZ GmbH, Department: Application software

Deutsches Klimarechenzentrum
Bundesstraße 45a
D-20146 Hamburg

Phone: +49-40-460094-151
Fax: +49-40-460094-270
Email: Thomas Jahns <jahns at dkrz.de>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4639 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-rma/attachments/20140122/3125cfe1/attachment-0001.bin>

More information about the mpiwg-rma mailing list