[mpiwg-rma] Short question on the ccNUMA memory reality

Tue Aug 5 12:37:59 CDT 2014

Dave,

thank you for this helpful answer. 

My question was related to

> The sentence p436:45-46 
> "The order in which data is written is not
> specified unless further synchronization is used." 

Your citation helps, because it tells the usual expectation:
> 1. Each CPU will always perceive its own memory accesses as occurring
> in program order.
But in the MPI Standard nothing should be expected as usual.

The text p436:43-48 may be modified into
Advice to users. 
If accesses in the RMA unified model are not synchronized (with
locks or flushes, see Section 11.5.3), load and store operations might observe changes
to the memory while they are in progress. The order in which data is written is not
specified unless further synchronization is used. This might lead to inconsistent views
on memory and programs that assume that a transfer is complete by only checking
parts of the message are erroneous. 
NEW: The only consistent view is that each process will always
perceive its own memory accesses as occurring in program order.
(End of advice to users.)

But one can argue that this is already expected by everyone.
In my opinion, it is better to expect on MPI shared Memory
windows only semantics that is written black on White
in the MPI Standard. 

Best regards
Rolf

----- Original Message -----
> From: "Dave Goodell (dgoodell)" <dgoodell at cisco.com>
> To: "MPI WG Remote Memory Access working group" <mpiwg-rma at lists.mpi-forum.org>
> Cc: "Bill Long" <longb at cray.com>
> Sent: Tuesday, August 5, 2014 6:10:01 PM
> Subject: Re: [mpiwg-rma] Short question on the ccNUMA memory reality
> 
> On Aug 5, 2014, at 2:53 AM, "Balaji, Pavan" <balaji at anl.gov> wrote:
> 
> > 
> > On Aug 5, 2014, at 2:33 AM, Rolf Rabenseifner
> > <rabenseifner at hlrs.de> wrote:
> > 
> >> 1. Question (sequential consistency on one location):
> >> ------------
> >> 
> >> Do I understand correctly that in the following patter
> >> on a shared Memory or a ccNUM shared memory
> >> 
> >> rank 0     rank 1
> >>        print x
> >> x=val_1    print x
> >> x=val_2    print x
> >>        print x
> >> 
> >> the print statements can print only in the following
> >> sequence
> >> - some times the previous value
> >> - some times val_1
> >> - and after some time val_2 and it then stays to print val_2
> >> 
> >> and that it can never be that a sequence with val_2 before val_1
> >> can be produced, i.e.,
> >> old_val
> >> val_2
> >> val_1
> >> val_2
> >> is impossible.
> > 
> > When there are dependencies between statements, the compiler and
> > architecture don’t reorder them.
> 
> Rolf, for a bit of clarification on this, I use Paul McKenney's
> theoretical "Ordering-Hostile Architecture" [1] as my guideline for
> what an arbitrary architecture might choose to do.  ***This is
> separate from any language/compiler (non-)guarantees.***
> 
> [1]
> http://kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook-e1.pdf
> 
> """
> C.6.1 Ordering-Hostile Architecture
> A number of ordering-hostile computer systems have been produced over
> the decades, but the nature of the hostility has always been
> extremely subtle, and understanding it has required detailed
> knowledge of the specific hardware. Rather than picking on a
> specific hardware vendor, and as a presumably attractive alternative
> to dragging the reader through detailed technical specifications,
> let us instead de-
> sign a mythical but maximally memory-ordering-hostile computer
> architecture.4
> This hardware must obey the following ordering con- straints [McK05a,
> McK05b]:
> 
> 1. Each CPU will always perceive its own memory accesses as occurring
> in program order.
> 
> 2. CPUs will reorder a given operation with a store only if the two
> operations are referencing different locations.
> 
> 3. All of a given CPU’s loads preceding a read memory barrier
> (smp_rmb()) will be perceived by all CPUs to precede any loads
> following that read memory barrier.
> 
> 4. All of a given CPU’s stores preceding a write mem- ory barrier
> (smp_wmb()) will be perceived by all CPUs to precede any stores
> following that write memory barrier.
> 
> 5. All of a given CPU’s accesses (loads and stores) preceding a full
> memory barrier (smp_mb()) will be perceived by all CPUs to precede
> any accesses following that memory barrier.
> """
> 
> Point 2 is particularly relevant here.
> 
> >> Also other values are impossible, e.g., some bit or byte-mix
> >> from val_1 and val_2.
> > 
> > That is possible, unless you use some form of atomicity.
> 
> Right, though such "weird" patterns are most likely to occur due to
> some compiler behavior, not at the architectural level, assuming
> that "x" is word-sized or smaller and naturally aligned.
> 
> -Dave
> 
> _______________________________________________
> mpiwg-rma mailing list
> mpiwg-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> 

-- 
Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307)