[mpiwg-rma] MPI_WIN_CREATE for intercommunicators

Tue Jan 21 13:03:08 CST 2014

It makes my brain hurt just to think about this.

First, one should recognize that RMA communication occurs using the
window object, not the communicator as in p2p and collectives.  The
reason for a comm object in WIN_CREATE et al. is to enable MPI to
communicate between participating processes in order to create the
window.  For example, many implementations will do an ALLGATHER inside
of WIN_CREATE.  That operation needs a comm object.

The second purpose of the comm is to generate the group of the window
in order to denote which ranks know about the window.  It bothers me a
negligible amount that the exact definition of the rank arguments for
RMA is not defined explicitly, but one can reasonably assume that
those are ranks on the group of the window.  I suppose one could talk
about local and remote groups for a window but again, brain hurt.

I do not believe that the communicator has any purpose beyond this and
thus do not know what the consequences of allowing intercomms would
be.  Perhaps we can use - only for the sake of argument since it is
not required to have these semantics - that WIN_CREATE is like
ALLGATHER and use the semantics of ALLGATHER on intercomms to define
the semantics of win objects created using intercomms.

In any case, I don't see any real value in this, honestly.  Efficiency
arguments assume implementation details not specified by the standard.
 Making a promise to not communicate with some ranks has no effect on
some implementations.  Of course we know ones where it does, but
that's not germane.

Honestly, I think any sane implementer will just MPI_COMM_TEST_INTER
-> INTERCOMM_MERGE inside of of WIN_CREATE (et al.), do everything as
before and then free the temp intracomm at the end.  Then I would
never, ever think about this issue again until someone paid me a huge
sum of money to optimize for the intercomm case.  Thus, the result of
changing the standard will be almost nothing other than people will be
able to avoid doing the following explicitly:

int MPE_Win_create(void *base, MPI_Aint size, int disp_unit, MPI_Info
info, MPI_Comm comm, MPI_Win *win)
{
  int is_intercomm;
  MPI_Comm intracomm;
  MPI_Comm_test_inter(comm, &is_intercomm);
  if (is_intercomm)
    MPI_Intercomm_merge(comm, 0, &intracomm);
  else
    intracomm = comm;
  MPI_Win_create(base, size, disp_unit, info, intracomm, win);
  if (is_intercomm)
    MPI_Comm_free(&intracomm);
  return MPI_SUCCESS;
}

In short, I don't see any value in trying to define the perverse
semantics of intercomm-based windows to avoid the ~12 lines of code
above.

In the event we try to act on this proposal, I suppose we use the very
limited text regarding send-recv on intercomms as our guide: "A target
process is addressed by its rank in the remote group, both for sends
and for receives." [pg. 258]

Best,

Jeff

On Tue, Jan 21, 2014 at 12:28 PM, Thomas Jahns <jahns at dkrz.de> wrote:
> Hello,
>
> should this be the wrong mailing list[1], I apologize in advance and hope for
> directions to the correct place to pose my question.
>
> From the wording of the MPI 2.2 and 3.0 standards I can see (first
> paragraph of 11.2.1 in MPI 2.2, procedurce specification of
> MPI_WIN_CREATE in MPI 3.0) that MPI_WIN_CREATE is only allowed for
> intracommunicators.
>
> I was wondering why this restriction is there, when
>
> 1. cases where an intercommunicator is more clear for point-to-point
>    communications, communications using rma would equally gain in
>    clarity with an intercommunicator (from my POV at least),
>
> 2. the required resources (because the number of potential
>    communication paths would be much lower) could be reduced because
>    the intercommunicator not only specifies two groups communicating
>    with each other but also not communicating (at least via the
>    intercommunicator) within each group. I.e. when it is already known
>    that the any two processes within one of the two groups of an
>    intercommunicator would not issue RMA calls for the other,
>    corresponding resources need not be reserved. Intra-communicator RMA
>    on the other hand leaves the possibility of any communication pair
>    open.
>
>    Clearly with two groups of processes of sizes N and M the number of
>    potential pairs is N * M where a corresponding intercommunicator is
>    concerned but (N * M)**2 for an intracommunicator and
>
> 3. the semantics appropriate for an intercommunicator should be easy
>    to emulate with MPI_INTERCOMM_MERGE and MPI_GROUP_TRANSLATE_RANKS.
>
> Since allowing intercommunicators in MPI_WIN_CREATE would consequently
>
> 1. be aesthetically pleasing,
> 2. offer increased potential for efficient execution,
> 3. seem to have an easy proof-of-concept implementation strategy and
> 4. forms a pattern arising naturally from the rest of the standard in my eyes,
>
> I'm clearly missing some ambiguity or technical/definition difficulty
> here, but can't figure out which, why else would the standard
> explicitly restrict MPI_WIN_CREATE to intracommunicators if the above
> were the whole story?
>
> Kind regards,
> Thomas Jahns
>
> [1] I've done some searching and came up with either
> <mpiwg-rma at lists.mpi-forum.org> or <mpi-comment at lists.mpi-forum.org>
> --
> Thomas Jahns
> DKRZ GmbH, Department: Application software
>
> Deutsches Klimarechenzentrum
> Bundesstraße 45a
> D-20146 Hamburg
>
> Phone: +49-40-460094-151
> Fax: +49-40-460094-270
> Email: Thomas Jahns <jahns at dkrz.de>
>
>
> _______________________________________________
> mpiwg-rma mailing list
> mpiwg-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma

-- 
Jeff Hammond
jeff.science at gmail.com