[Mpi3-rma] RMA proposal 1 update
Underwood, Keith D
keith.d.underwood at intel.com
Tue May 18 13:52:59 CDT 2010
> >>> OK, as long as everything goes through the NIC. Are we considering
> >>> direct load/stores differently from remote puts/gets in terms of
> >> completion?
> >>
> >> That's a good point. Even if you don't do direct load/stores, MPI
> >> implementations will want to optimize MPI_Puts/gets to local or
> shared
> >> memory windows on cache-coherent architectures by using load/stores
> >> internally.
> >
> > Which should still be ok, if you do it right. By the time that a
> target can "know" that a source thinks all of the Puts are complete,
> you should have made it globally visible (PCIExpress is an ordered
> channel, unless you use the unordered option, which you wouldn't if you
> wanted to do things like, oh, tell the app that a message had been
> received and really mean it ;-) Worst case you have to do something
> like send a message down each pipe as part of the barrier (or
> comparable event) that informs the target that the source has done a
> flush...
>
> Let me add one more level of complication then :-). Two network
> adapters
> + shared memory. Let's say the Put went over NIC1 and a mutex went over
> NIC2 (get-accumulate). Then a local process checks the mutex and wants
> to access the data written by the put.
>
> Theoretically, there's no ordering here. Note that the two NICs can be
> on two different PCIe slots and cause corruption.
Now you are just trying to be difficult... First, your scenario is not legal. You have to call a local MPI_Lock()/MPI_Unlock() before that data is visible in the private window to allow loads and stores. Even accessing that item that was Put over NIC1 is undefined until the source has done a completion operation.
Even then, I think you are discussing an ordering problem that exists in the base standard: completing an MPI_Unlock() implies remote completing. Real remote completion. Until MPI_Unlock() completes, there is no guarantee of ordering between anything. MPI_flush() does not add to this issue.
Keith
More information about the mpiwg-rma
mailing list