[mpiwg-rma] MPI_WIN_CREATE for intercommunicators

Jim Dinan james.dinan at gmail.com
Tue Feb 11 09:59:52 CST 2014

Hi Thomas,

This sounds like a reasonable proposal to me; it should be possible to
create a sane definition for windows on intercommunicators.  The RMA
interface has focused on intracommunicators, so it could require several
additions to e.g. make it possible to query both local and remote groups
from the window.  I think the main argument against adding this would be
the increased complexity in the standard and to implementations.  Given
that there is a workaround (flattening the communicator to make the window,
and using post/start/complete/wait to restrict the set of communication
peers in active target), we would need to motivate such a change.

I captured your proposal in a ticket,
https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/412, so we remember to
discuss this further in the RMA working group.


On Wed, Jan 22, 2014 at 5:48 AM, Thomas Jahns <jahns at dkrz.de> wrote:

> Hello Jeff,
> On 01/21/14 20:03, Jeff Hammond wrote:
> > It makes my brain hurt just to think about this.
> I certainly intended no harm, however mild ;-).
> > First, one should recognize that RMA communication occurs using the
> > window object, not the communicator as in p2p and collectives.  The
> > reason for a comm object in WIN_CREATE et al. is to enable MPI to
> > communicate between participating processes in order to create the
> > window.  For example, many implementations will do an ALLGATHER inside
> > of WIN_CREATE.  That operation needs a comm object.
> So there is no problem here with using an intercommunicator?
> > The second purpose of the comm is to generate the group of the window
> > in order to denote which ranks know about the window.  It bothers me a
> > negligible amount that the exact definition of the rank arguments for
> > RMA is not defined explicitly, but one can reasonably assume that
> > those are ranks on the group of the window.  I suppose one could talk
> > about local and remote groups for a window but again, brain hurt.
> I think what's defined in lines 20-25 of page 420 of MPI 3.0 already makes
> perfect sense if comm were an intercommunicator since it's specified in
> terms of
> what would happen for point-to-point communications. So no problem either.
> > I do not believe that the communicator has any purpose beyond this and
> > thus do not know what the consequences of allowing intercomms would
> > be.  Perhaps we can use - only for the sake of argument since it is
> > not required to have these semantics - that WIN_CREATE is like
> > ALLGATHER and use the semantics of ALLGATHER on intercomms to define
> > the semantics of win objects created using intercomms.
> For e.g. MPI_Win_post one needs a group, based on the communicator
> originally
> passed to MPI_Win_create but that's already solved for intracommunicators
> and no
> new problem.
> > In any case, I don't see any real value in this, honestly.  Efficiency
> > arguments assume implementation details not specified by the standard.
> >  Making a promise to not communicate with some ranks has no effect on
> > some implementations.  Of course we know ones where it does, but
> > that's not germane.
> Still it's something I cannot express currently.
> > Honestly, I think any sane implementer will just MPI_COMM_TEST_INTER
> > -> INTERCOMM_MERGE inside of of WIN_CREATE (et al.), do everything as
> > before and then free the temp intracomm at the end.  Then I would
> > never, ever think about this issue again until someone paid me a huge
> > sum of money to optimize for the intercomm case.  Thus, the result of
> > changing the standard will be almost nothing other than people will be
> > able to avoid doing the following explicitly:
> For the above scenario the original communicator needs to be retained,
> i.e. to
> make use of for a later MPI_Group_translate, but otherwise you are right
> unless
> mpi_win_create were to be significantly less expensive with less potential
> paths.
> > int MPE_Win_create(void *base, MPI_Aint size, int disp_unit, MPI_Info
> > info, MPI_Comm comm, MPI_Win *win)
> > {
> >   int is_intercomm;
> >   MPI_Comm intracomm;
> >   MPI_Comm_test_inter(comm, &is_intercomm);
> >   if (is_intercomm)
> >     MPI_Intercomm_merge(comm, 0, &intracomm);
> >   else
> >     intracomm = comm;
> >   MPI_Win_create(base, size, disp_unit, info, intracomm, win);
> >   if (is_intercomm)
> >     MPI_Comm_free(&intracomm);
> >   return MPI_SUCCESS;
> > }
> >
> > In short, I don't see any value in trying to define the perverse
> > semantics of intercomm-based windows to avoid the ~12 lines of code
> > above.
> In my case, where I have a client-server setup with point-to-point
> communication
> happening via an intercommunicator, adding bulk communication via RMA the
> current state of affairs means I have to use an additional communicator and
> translate IDs. But sure, it can be done with something similar to the
> above,
> it's just more code on the user side.
> > In the event we try to act on this proposal, I suppose we use the very
> > limited text regarding send-recv on intercomms as our guide: "A target
> > process is addressed by its rank in the remote group, both for sends
> > and for receives." [pg. 258]
> That's what seemed intuitive to me too (see above).
> Regards, Thomas
> --
> Thomas Jahns
> DKRZ GmbH, Department: Application software
> Deutsches Klimarechenzentrum
> Bundesstraße 45a
> D-20146 Hamburg
> Phone: +49-40-460094-151
> Fax: +49-40-460094-270
> Email: Thomas Jahns <jahns at dkrz.de>
> _______________________________________________
> mpiwg-rma mailing list
> mpiwg-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-rma/attachments/20140211/29171c87/attachment.html>

More information about the mpiwg-rma mailing list