[mpiwg-rma] Short question on the ccNUMA memory reality

Tue Aug 5 02:53:06 CDT 2014

On Aug 5, 2014, at 2:33 AM, Rolf Rabenseifner <rabenseifner at hlrs.de> wrote:
> Dear expert on ccNUMA,

Huh?

> 1. Question (sequential consistency on one location):
> ------------
> 
> Do I understand correctly that in the following patter
> on a shared Memory or a ccNUM shared memory
> 
> rank 0     rank 1
>         print x
> x=val_1    print x
> x=val_2    print x
>         print x
> 
> the print statements can print only in the following
> sequence 
> - some times the previous value
> - some times val_1
> - and after some time val_2 and it then stays to print val_2
> 
> and that it can never be that a sequence with val_2 before val_1 
> can be produced, i.e.,
> old_val
> val_2
> val_1
> val_2
> is impossible.

When there are dependencies between statements, the compiler and architecture don’t reorder them.

> Also other values are impossible, e.g., some bit or byte-mix
> from val_1 and val_2.

That is possible, unless you use some form of atomicity.

> 2. Question:
> -----------
> What is the largest size that the memory operations are atomic,
> i.e., that we do not see a bit or byte-mix from val_1 and val_2?
> Is it 1, 4, 8, 16 bytes or can it be a total struct that fits 
> into a cacheline? 

It depends on the architecture, but all architectures I’m aware of do give store atomicity of loads and stores till sizeof(void*), assuming many things including the variables are volatile, aligned correctly in memory, etc.  Of course, this is not portable.  To be portable, you should use atomic operations, e.g., with OpenPA.

> 3. Question (about two updates):
> -----------
> 
> rank 0       rank 1
> x=x_ld
> y=yold
> ---- necessary synchronizations -----           
>           print x (which shows xold)
>           print y (which shows yold)
> ---- necessary synchronizations -----           
> x=xnew    
> y=ynew    
>           print x
>           print y
>           after some time
>           print x
>           print y
> 
> Possible results are
> - xold,yold  xold,yold  xnew,ynew             
> - xold,yold  xnew,yold  xnew,ynew             
> - xold,yold  xold,ynew  xnew,ynew 
> i.e., the y=ynew can arrive at another process
>       faster than the x=xnew, although the storing 
>       process issues the stores in the sequence
>       x=xnew, y=ynew.
> - xold,yold  xnew,ynew  xnew,ynew  
> 
> The assignments should represent the store instructions,
> and not the source code (because the compiler may modify
> sequence of instructions compared to the source code)              
> 
> Do I understand correctly, that the sequence of two 
> store instructions two two different locations in one process 
> may be visible at another process in a different sequence?

Correct.  The compiler/hardware can reorder operations that they perceive as unrelated.

--
Pavan Balaji  ✉️
http://www.mcs.anl.gov/~balaji