[mpiwg-rma] Short question on the ccNUMA memory reality
Balaji, Pavan
balaji at anl.gov
Tue Aug 5 02:53:06 CDT 2014
On Aug 5, 2014, at 2:33 AM, Rolf Rabenseifner <rabenseifner at hlrs.de> wrote:
> Dear expert on ccNUMA,
Huh?
> 1. Question (sequential consistency on one location):
> ------------
>
> Do I understand correctly that in the following patter
> on a shared Memory or a ccNUM shared memory
>
> rank 0 rank 1
> print x
> x=val_1 print x
> x=val_2 print x
> print x
>
> the print statements can print only in the following
> sequence
> - some times the previous value
> - some times val_1
> - and after some time val_2 and it then stays to print val_2
>
> and that it can never be that a sequence with val_2 before val_1
> can be produced, i.e.,
> old_val
> val_2
> val_1
> val_2
> is impossible.
When there are dependencies between statements, the compiler and architecture don’t reorder them.
> Also other values are impossible, e.g., some bit or byte-mix
> from val_1 and val_2.
That is possible, unless you use some form of atomicity.
> 2. Question:
> -----------
> What is the largest size that the memory operations are atomic,
> i.e., that we do not see a bit or byte-mix from val_1 and val_2?
> Is it 1, 4, 8, 16 bytes or can it be a total struct that fits
> into a cacheline?
It depends on the architecture, but all architectures I’m aware of do give store atomicity of loads and stores till sizeof(void*), assuming many things including the variables are volatile, aligned correctly in memory, etc. Of course, this is not portable. To be portable, you should use atomic operations, e.g., with OpenPA.
> 3. Question (about two updates):
> -----------
>
> rank 0 rank 1
> x=x_ld
> y=yold
> ---- necessary synchronizations -----
> print x (which shows xold)
> print y (which shows yold)
> ---- necessary synchronizations -----
> x=xnew
> y=ynew
> print x
> print y
> after some time
> print x
> print y
>
> Possible results are
> - xold,yold xold,yold xnew,ynew
> - xold,yold xnew,yold xnew,ynew
> - xold,yold xold,ynew xnew,ynew
> i.e., the y=ynew can arrive at another process
> faster than the x=xnew, although the storing
> process issues the stores in the sequence
> x=xnew, y=ynew.
> - xold,yold xnew,ynew xnew,ynew
>
> The assignments should represent the store instructions,
> and not the source code (because the compiler may modify
> sequence of instructions compared to the source code)
>
> Do I understand correctly, that the sequence of two
> store instructions two two different locations in one process
> may be visible at another process in a different sequence?
Correct. The compiler/hardware can reorder operations that they perceive as unrelated.
--
Pavan Balaji ✉️
http://www.mcs.anl.gov/~balaji
More information about the mpiwg-rma
mailing list