[Mpi3-rma] Memory barriers in MPI_WIN_LOCK_ALL mode

Tue Oct 30 07:28:45 CDT 2012

Hi Pavan,

> Consider the following program:
>
> P0:
> 	MPI_WIN_LOCK_ALL();
> 	MPI_PUT(1, X, P1); /* Write 1 to variable X on P1 */
Assuming MPI_Flush(P1) here (or Flush_all)
> 	/* wave hand to P1 */
>
> P1:
> 	MPI_WIN_LOCK_ALL();
> 	/* wait for P0 to wave hand */
> 	MPI_GET(X, P1);  /* local operation */
>
> If the MPI_GET on P1 does a local load operation internally, it is not  
> memory consistent without a memory barrier.  Does this mean that I need  
> to always do a memory barrier on all local GET operations to get the  
> right value?
Well, if the put came from shared memory, then the flush would most
likely do a memory barrier (depending on the consistency model of the
architecture, i.e., in TSO (x86) it would need an mfence). If the put
came from remote memory then there should not be a problem because the
flush must block until the data is visible to the CPU (assuming the
unified memory model, if you were in separate, you'd need an additional
win_synch).

> Note that this inefficiency does not go away by replacing the waving of  
> the hand with an MPI_BARRIER for example, as it does not know which  
> window, if any, the synchronization is for.
Yes.

> Note that this is likely only a theoretical exercise, since most (all?)  
> compilers will do a memory barrier anyway if they see a function call  
> (MPI_GET in this case).  But is MPI assuming that that's going to be the  
> case for efficient execution?
Really? Which compiler puts a memory barrier before function calls? This
sounds rather inefficient to me (in the days of fastcall).

Best,
  Torsten

-- 
### qreharg rug ebs fv crryF ------------- http://www.unixer.de/ -----
Torsten Hoefler           | Assistant Professor
Dept. of Computer Science | ETH Zürich
Universitätsstrasse 6     | Zurich-8092, Switzerland
CAB E 64.1                | Phone: +41 76 309 79 29