[mpiwg-rma] Short question on the ccNUMA memory reality

Tue Aug 5 02:33:19 CDT 2014

Dear expert on ccNUMA,

three questions, which hopefully may be trivial:

1. Question (sequential consistency on one location):
------------

Do I understand correctly that in the following patter
on a shared Memory or a ccNUM shared memory

rank 0     rank 1
           print x
x=val_1    print x
x=val_2    print x
           print x

the print statements can print only in the following
sequence 
 - some times the previous value
 - some times val_1
 - and after some time val_2 and it then stays to print val_2

and that it can never be that a sequence with val_2 before val_1 
can be produced, i.e.,
  old_val
  val_2
  val_1
  val_2
is impossible.

Also other values are impossible, e.g., some bit or byte-mix
from val_1 and val_2.

2. Question:
-----------
What is the largest size that the memory operations are atomic,
i.e., that we do not see a bit or byte-mix from val_1 and val_2?
Is it 1, 4, 8, 16 bytes or can it be a total struct that fits 
into a cacheline? 

3. Question (about two updates):
-----------

rank 0       rank 1
x=x_ld
y=yold
---- necessary synchronizations -----           
             print x (which shows xold)
             print y (which shows yold)
---- necessary synchronizations -----           
x=xnew    
y=ynew    
             print x
             print y
             after some time
             print x
             print y

Possible results are
 - xold,yold  xold,yold  xnew,ynew             
 - xold,yold  xnew,yold  xnew,ynew             
 - xold,yold  xold,ynew  xnew,ynew 
   i.e., the y=ynew can arrive at another process
         faster than the x=xnew, although the storing 
         process issues the stores in the sequence
         x=xnew, y=ynew.
 - xold,yold  xnew,ynew  xnew,ynew  

The assignments should represent the store instructions,
and not the source code (because the compiler may modify
sequence of instructions compared to the source code)              

Do I understand correctly, that the sequence of two 
store instructions two two different locations in one process 
may be visible at another process in a different sequence?

I ask all these questions to understand which memory model
can be defined for MPI shared memory windows.

Best regards
Rolf

-- 
Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307)