[Mpi3-rma] Memory barriers in MPI_WIN_LOCK_ALL mode
balaji at mcs.anl.gov
Tue Oct 30 08:48:47 CDT 2012
On 10/30/2012 07:28 AM, Torsten Hoefler wrote:
>> If the MPI_GET on P1 does a local load operation internally, it is not
>> memory consistent without a memory barrier. Does this mean that I need
>> to always do a memory barrier on all local GET operations to get the
>> right value?
> Well, if the put came from shared memory, then the flush would most
> likely do a memory barrier (depending on the consistency model of the
> architecture, i.e., in TSO (x86) it would need an mfence). If the put
> came from remote memory then there should not be a problem because the
> flush must block until the data is visible to the CPU (assuming the
> unified memory model, if you were in separate, you'd need an additional
Actually, the flush only needs to block till its visible to another MPI
RMA operation, not to the CPU (as in, not to load/store). I don't think
the target process can guarantee that the PUT is visible to a load/store
without an additional memory barrier. Since in the MPI standard, we
don't specify that the user needs to call a WIN_SYNC in this case, I'm
asking if the MPI implementation needs to do a memory barrier internally.
>> Note that this is likely only a theoretical exercise, since most (all?)
>> compilers will do a memory barrier anyway if they see a function call
>> (MPI_GET in this case). But is MPI assuming that that's going to be the
>> case for efficient execution?
> Really? Which compiler puts a memory barrier before function calls? This
> sounds rather inefficient to me (in the days of fastcall).
Sorry, I meant it'll not reorder operations across function calls,
rather than a memory barrier.
More information about the mpiwg-rma