[mpiwg-rma] FW: [Mpi3-rma] [EXTERNAL] Re: MPI-3 UNIFIED model updates

Barrett, Brian W bwbarre at sandia.gov
Wed Aug 28 15:47:48 CDT 2013


On 8/28/13 3:37 AM, "Pavan Balaji" <balaji at mcs.anl.gov> wrote:

>On 08/27/2013 10:35 PM, Barrett, Brian W wrote:
>> Perhaps implementation-defined is the wrong word.  I'm not sure I agree
>> with keith's undefined fright, but perhaps platform-defined?  There's a
>> big difference between a replacement for MPI_WAIT and a memory barrier
>>in
>> a loop.
>
>The comparison here should be a platform-specific memory barrier in a
>loop vs. MPI_WIN_SYNC in a loop.
>
>My concern with "platform-defined" is that we are assuming that there is
>exactly one way of doing this for a given platform, which is known to
>the MPI implementation and to the user.  That is mandating something on
>the implementation and makes me uncomfortable.

I see that concern, and agree with it.

>"Implementation-defined" is better, in that the implementation is free
>to do whatever it wants.  The user can do the same thing if she knows
>what the implementation does.  But practically, there's likely no
>difference implementation-defined and undefined.

My problem with implementation defined is exactly your problem with
platform defined.  But since I think both are better than "undefined", I'm
not sure I care all that much :).

>Can you also clarify which of the two items below is the real concern
>here (or both?):
>
>1. Is it the cost of the MPI_WIN_SYNC function call?  If yes, how about
>we deprecate/remove PMPI_WIN_SYNC?  That way, MPI_WIN_SYNC can be a
>macro and truly a no-op on some platforms, so the concern that it might
>be expensive will go away?

That's certainly one concern, but the minor concern.

>2. Is it the fact that I have to do some additional call even on
>platforms where it is not needed (e.g., x86)?  So basically you want to
>do this:
>
>P0:
>Win_lock_all
>Put(X)
>Flush
>Send
>
>P1:
>Win_lock_all
>Recv
>/* No Win_sync here */
>read X

I believe on x86 that your example is correct.  I think you either need a
memory barrier or WIN_SYNC on other platforms.  And, further, I believe
that:

X = 0
Win_lock_all
Recv
do { compiler_fence(); } while (0 == X);

will eventually complete on any platform that supports UNIFIED, as the
public and private copies are eventually consistent.  This gives us the
behaviors necessary for implementing SHMEM on MPI-3.


>>> Also, I think we are ignoring the point Jim raised: does OpenSHMEM
>>> really need these semantics?
>>
>> I think this is a grey area of OpenSHMEM and my belief is that it really
>> does need those semantics.
>
>Perhaps this is just an oversight in OpenSHMEM and can be fixed?

I don't think it's an oversight.  Remember that OpenSHMEM, while a great
leap forward from SGI's man pages, is still a fairly loose specification.
There are definitely codes that spin on a completion word (or poll and
move on to other things, where shmem_wait wouldn't work) in the wild, and
they're considered correct codes.

Brian

--
  Brian W. Barrett
  Scalable System Software Group
  Sandia National Laboratories







More information about the mpiwg-rma mailing list