[Mpi3-rma] Memory barriers in MPI_WIN_LOCK_ALL mode

Tue Oct 30 08:48:47 CDT 2012

On 10/30/2012 07:28 AM, Torsten Hoefler wrote:
>> If the MPI_GET on P1 does a local load operation internally, it is not
>> memory consistent without a memory barrier.  Does this mean that I need
>> to always do a memory barrier on all local GET operations to get the
>> right value?
> Well, if the put came from shared memory, then the flush would most
> likely do a memory barrier (depending on the consistency model of the
> architecture, i.e., in TSO (x86) it would need an mfence). If the put
> came from remote memory then there should not be a problem because the
> flush must block until the data is visible to the CPU (assuming the
> unified memory model, if you were in separate, you'd need an additional
> win_synch).

Actually, the flush only needs to block till its visible to another MPI 
RMA operation, not to the CPU (as in, not to load/store).  I don't think 
the target process can guarantee that the PUT is visible to a load/store 
without an additional memory barrier.  Since in the MPI standard, we 
don't specify that the user needs to call a WIN_SYNC in this case, I'm 
asking if the MPI implementation needs to do a memory barrier internally.

>> Note that this is likely only a theoretical exercise, since most (all?)
>> compilers will do a memory barrier anyway if they see a function call
>> (MPI_GET in this case).  But is MPI assuming that that's going to be the
>> case for efficient execution?
> Really? Which compiler puts a memory barrier before function calls? This
> sounds rather inefficient to me (in the days of fastcall).

Sorry, I meant it'll not reorder operations across function calls, 
rather than a memory barrier.

  -- Pavan

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji