[Mpi3-rma] MPI-3 UNIFIED model clarification

Tue Jul 30 17:40:36 CDT 2013

Pavan Balaji <balaji at mcs.anl.gov> writes:
> This is what I said is the disagreement in the WG.  I can pull up the 
> old email chain if needed, but I think others can too.  One side was 
> arguing that there's no such guarantee and you need to do a WIN_SYNC to 
> see the value.  The other side was arguing that the WIN_SYNC should not 
> be needed; FLUSH + SEND on the origin should be enough.

Hmm, linux/Documentation/memory-barriers.txt says:

  CACHE COHERENCY VS DMA
  ----------------------

  Not all systems maintain cache coherency with respect to devices doing
  DMA.  In such cases, a device attempting DMA may obtain stale data
  from RAM because dirty cache lines may be resident in the caches of
  various CPUs, and may not have been written back to RAM yet.  To deal
  with this, the appropriate part of the kernel must flush the
  overlapping bits of cache on each CPU (and maybe invalidate them as
  well).

  In addition, the data DMA'd to RAM by a device may be overwritten by
  dirty cache lines being written back to RAM from a CPU's cache after
  the device has installed its own data, or cache lines present in the
  CPU's cache may simply obscure the fact that RAM has been updated,
  until at such time as the cacheline is discarded from the CPU's cache
  and reloaded.  To deal with this, the appropriate part of the kernel
  must invalidate the overlapping bits of the cache on each CPU.

I've taken this to mean that you can't guarantee that a DMA write will
"eventually" be visible to the CPU (because the cache line could hang
around arbitrarily long).  Are implementations doing something here to
ensure that cache on the target is invalidated after RDMA operations?

I think (though not with confidence) that CPU cache invalidation will
eventually (with a practical upper bound) propagate to other CPUs.
Comparing to Alpha with split cache lines, one of the buses could be
busy and thus not update despite proper memory ordering on the write
end.  The implication is that each cache has a (fair?) queue and it
cannot be arbitrarily long, though I've never seen a statement providing
an upper bound on how long the cache bank could be busy.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-rma/attachments/20130731/9dde931c/attachment-0001.pgp>