[mpiwg-rma] FW: [Mpi3-rma] [EXTERNAL] Re: MPI-3 UNIFIED model updates
Barrett, Brian W
bwbarre at sandia.gov
Wed Aug 28 15:47:48 CDT 2013
On 8/28/13 3:37 AM, "Pavan Balaji" <balaji at mcs.anl.gov> wrote:
>On 08/27/2013 10:35 PM, Barrett, Brian W wrote:
>> Perhaps implementation-defined is the wrong word. I'm not sure I agree
>> with keith's undefined fright, but perhaps platform-defined? There's a
>> big difference between a replacement for MPI_WAIT and a memory barrier
>>in
>> a loop.
>
>The comparison here should be a platform-specific memory barrier in a
>loop vs. MPI_WIN_SYNC in a loop.
>
>My concern with "platform-defined" is that we are assuming that there is
>exactly one way of doing this for a given platform, which is known to
>the MPI implementation and to the user. That is mandating something on
>the implementation and makes me uncomfortable.
I see that concern, and agree with it.
>"Implementation-defined" is better, in that the implementation is free
>to do whatever it wants. The user can do the same thing if she knows
>what the implementation does. But practically, there's likely no
>difference implementation-defined and undefined.
My problem with implementation defined is exactly your problem with
platform defined. But since I think both are better than "undefined", I'm
not sure I care all that much :).
>Can you also clarify which of the two items below is the real concern
>here (or both?):
>
>1. Is it the cost of the MPI_WIN_SYNC function call? If yes, how about
>we deprecate/remove PMPI_WIN_SYNC? That way, MPI_WIN_SYNC can be a
>macro and truly a no-op on some platforms, so the concern that it might
>be expensive will go away?
That's certainly one concern, but the minor concern.
>2. Is it the fact that I have to do some additional call even on
>platforms where it is not needed (e.g., x86)? So basically you want to
>do this:
>
>P0:
>Win_lock_all
>Put(X)
>Flush
>Send
>
>P1:
>Win_lock_all
>Recv
>/* No Win_sync here */
>read X
I believe on x86 that your example is correct. I think you either need a
memory barrier or WIN_SYNC on other platforms. And, further, I believe
that:
X = 0
Win_lock_all
Recv
do { compiler_fence(); } while (0 == X);
will eventually complete on any platform that supports UNIFIED, as the
public and private copies are eventually consistent. This gives us the
behaviors necessary for implementing SHMEM on MPI-3.
>>> Also, I think we are ignoring the point Jim raised: does OpenSHMEM
>>> really need these semantics?
>>
>> I think this is a grey area of OpenSHMEM and my belief is that it really
>> does need those semantics.
>
>Perhaps this is just an oversight in OpenSHMEM and can be fixed?
I don't think it's an oversight. Remember that OpenSHMEM, while a great
leap forward from SGI's man pages, is still a fairly loose specification.
There are definitely codes that spin on a completion word (or poll and
move on to other things, where shmem_wait wouldn't work) in the wild, and
they're considered correct codes.
Brian
--
Brian W. Barrett
Scalable System Software Group
Sandia National Laboratories
More information about the mpiwg-rma
mailing list