[mpiwg-rma] Short question on the ccNUMA memory reality

Tue Aug 5 03:19:46 CDT 2014

Pavan,

please can you look at the more detailed 
question at end of question 1.

Thank you for your answers on questions 2 and 3.

Rolf

----- Original Message -----
> From: "Pavan Balaji" <balaji at anl.gov>
> To: "MPI WG Remote Memory Access working group" <mpiwg-rma at lists.mpi-forum.org>
> Cc: "Bill Long" <longb at cray.com>
> Sent: Tuesday, August 5, 2014 9:53:06 AM
> Subject: Re: [mpiwg-rma] Short question on the ccNUMA memory reality
> 
> 
> On Aug 5, 2014, at 2:33 AM, Rolf Rabenseifner <rabenseifner at hlrs.de>
> wrote:
> > Dear expert on ccNUMA,
> 
> Huh?
> 
> > 1. Question (sequential consistency on one location):
> > ------------
> > 
> > Do I understand correctly that in the following patter
> > on a shared Memory or a ccNUM shared memory
> > 
> > rank 0     rank 1
> >         print x
> > x=val_1    print x
> > x=val_2    print x
> >         print x
> > 
> > the print statements can print only in the following
> > sequence
> > - some times the previous value
> > - some times val_1
> > - and after some time val_2 and it then stays to print val_2
> > 
> > and that it can never be that a sequence with val_2 before val_1
> > can be produced, i.e.,
> > old_val
> > val_2
> > val_1
> > val_2
> > is impossible.
> 
> When there are dependencies between statements, the compiler and
> architecture don’t reorder them.
> 
> > Also other values are impossible, e.g., some bit or byte-mix
> > from val_1 and val_2.
> 
> That is possible, unless you use some form of atomicity.

In the case of automatic atomicity according to answer of 2nd question,
together with the assumption that the compiler did not reorder
(i.e. the stores are the store instructions and not source code lines),
is it then guaranteed that the other process cannot see
a different sequence, i.e., that it cannot see val_2 for a short
time before it sees val_1, i.e., it is impossible that the 
other process would see
> > old_val
> > val_2
> > val_1
> > val_2
isn't it?

> > 2. Question:
> > -----------
> > What is the largest size that the memory operations are atomic,
> > i.e., that we do not see a bit or byte-mix from val_1 and val_2?
> > Is it 1, 4, 8, 16 bytes or can it be a total struct that fits
> > into a cacheline?
> 
> It depends on the architecture, but all architectures I’m aware of do
> give store atomicity of loads and stores till sizeof(void*),
> assuming many things including the variables are volatile, aligned
> correctly in memory, etc.  Of course, this is not portable.  To be
> portable, you should use atomic operations, e.g., with OpenPA.
> 
> > 3. Question (about two updates):
> > -----------
> > 
> > rank 0       rank 1
> > x=x_ld
> > y=yold
> > ---- necessary synchronizations -----
> >           print x (which shows xold)
> >           print y (which shows yold)
> > ---- necessary synchronizations -----
> > x=xnew
> > y=ynew
> >           print x
> >           print y
> >           after some time
> >           print x
> >           print y
> > 
> > Possible results are
> > - xold,yold  xold,yold  xnew,ynew
> > - xold,yold  xnew,yold  xnew,ynew
> > - xold,yold  xold,ynew  xnew,ynew
> > i.e., the y=ynew can arrive at another process
> >       faster than the x=xnew, although the storing
> >       process issues the stores in the sequence
> >       x=xnew, y=ynew.
> > - xold,yold  xnew,ynew  xnew,ynew
> > 
> > The assignments should represent the store instructions,
> > and not the source code (because the compiler may modify
> > sequence of instructions compared to the source code)
> > 
> > Do I understand correctly, that the sequence of two
> > store instructions two two different locations in one process
> > may be visible at another process in a different sequence?
> 
> Correct.  The compiler/hardware can reorder operations that they
> perceive as unrelated.
> 
> --
> Pavan Balaji  ✉️
> http://www.mcs.anl.gov/~balaji
> 
> _______________________________________________
> mpiwg-rma mailing list
> mpiwg-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma

-- 
Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307)