[mpiwg-rma] Short question on the ccNUMA memory reality

Tue Aug 5 11:10:01 CDT 2014

On Aug 5, 2014, at 2:53 AM, "Balaji, Pavan" <balaji at anl.gov> wrote:

> 
> On Aug 5, 2014, at 2:33 AM, Rolf Rabenseifner <rabenseifner at hlrs.de> wrote:
> 
>> 1. Question (sequential consistency on one location):
>> ------------
>> 
>> Do I understand correctly that in the following patter
>> on a shared Memory or a ccNUM shared memory
>> 
>> rank 0     rank 1
>>        print x
>> x=val_1    print x
>> x=val_2    print x
>>        print x
>> 
>> the print statements can print only in the following
>> sequence 
>> - some times the previous value
>> - some times val_1
>> - and after some time val_2 and it then stays to print val_2
>> 
>> and that it can never be that a sequence with val_2 before val_1 
>> can be produced, i.e.,
>> old_val
>> val_2
>> val_1
>> val_2
>> is impossible.
> 
> When there are dependencies between statements, the compiler and architecture don’t reorder them.

Rolf, for a bit of clarification on this, I use Paul McKenney's theoretical "Ordering-Hostile Architecture" [1] as my guideline for what an arbitrary architecture might choose to do.  ***This is separate from any language/compiler (non-)guarantees.***

[1] http://kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook-e1.pdf

"""
C.6.1 Ordering-Hostile Architecture
A number of ordering-hostile computer systems have been produced over the decades, but the nature of the hostility has always been extremely subtle, and understanding it has required detailed knowledge of the specific hardware. Rather than picking on a specific hardware vendor, and as a presumably attractive alternative to dragging the reader through detailed technical specifications, let us instead de-
sign a mythical but maximally memory-ordering-hostile computer architecture.4
This hardware must obey the following ordering con- straints [McK05a, McK05b]:

1. Each CPU will always perceive its own memory accesses as occurring in program order.

2. CPUs will reorder a given operation with a store only if the two operations are referencing different locations.

3. All of a given CPU’s loads preceding a read memory barrier (smp_rmb()) will be perceived by all CPUs to precede any loads following that read memory barrier.

4. All of a given CPU’s stores preceding a write mem- ory barrier (smp_wmb()) will be perceived by all CPUs to precede any stores following that write memory barrier.

5. All of a given CPU’s accesses (loads and stores) preceding a full memory barrier (smp_mb()) will be perceived by all CPUs to precede any accesses following that memory barrier.
"""

Point 2 is particularly relevant here.

>> Also other values are impossible, e.g., some bit or byte-mix
>> from val_1 and val_2.
> 
> That is possible, unless you use some form of atomicity.

Right, though such "weird" patterns are most likely to occur due to some compiler behavior, not at the architectural level, assuming that "x" is word-sized or smaller and naturally aligned.

-Dave