[Mpi3-rma] Memory barriers in MPI_WIN_LOCK_ALL mode
Pavan Balaji
balaji at mcs.anl.gov
Tue Oct 30 08:48:47 CDT 2012
On 10/30/2012 07:28 AM, Torsten Hoefler wrote:
>> If the MPI_GET on P1 does a local load operation internally, it is not
>> memory consistent without a memory barrier. Does this mean that I need
>> to always do a memory barrier on all local GET operations to get the
>> right value?
> Well, if the put came from shared memory, then the flush would most
> likely do a memory barrier (depending on the consistency model of the
> architecture, i.e., in TSO (x86) it would need an mfence). If the put
> came from remote memory then there should not be a problem because the
> flush must block until the data is visible to the CPU (assuming the
> unified memory model, if you were in separate, you'd need an additional
> win_synch).
Actually, the flush only needs to block till its visible to another MPI
RMA operation, not to the CPU (as in, not to load/store). I don't think
the target process can guarantee that the PUT is visible to a load/store
without an additional memory barrier. Since in the MPI standard, we
don't specify that the user needs to call a WIN_SYNC in this case, I'm
asking if the MPI implementation needs to do a memory barrier internally.
>> Note that this is likely only a theoretical exercise, since most (all?)
>> compilers will do a memory barrier anyway if they see a function call
>> (MPI_GET in this case). But is MPI assuming that that's going to be the
>> case for efficient execution?
> Really? Which compiler puts a memory barrier before function calls? This
> sounds rather inefficient to me (in the days of fastcall).
Sorry, I meant it'll not reorder operations across function calls,
rather than a memory barrier.
-- Pavan
--
Pavan Balaji
http://www.mcs.anl.gov/~balaji
More information about the mpiwg-rma
mailing list