[Mpi3-rma] RMA proposal 1 update

Underwood, Keith D keith.d.underwood at intel.com
Tue May 18 13:52:59 CDT 2010


> >>> OK, as long as everything goes through the NIC.  Are we considering
> >>> direct load/stores differently from remote puts/gets in terms of
> >> completion?
> >>
> >> That's a good point. Even if you don't do direct load/stores, MPI
> >> implementations will want to optimize MPI_Puts/gets to local or
> shared
> >> memory windows on cache-coherent architectures by using load/stores
> >> internally.
> >
> > Which should still be ok, if you do it right.  By the time that a
> target can "know" that a source thinks all of the Puts are complete,
> you should have made it globally visible (PCIExpress is an ordered
> channel, unless you use the unordered option, which you wouldn't if you
> wanted to do things like, oh, tell the app that a message had been
> received and really mean it ;-)  Worst case you have to do something
> like send a message down each pipe as part of the barrier (or
> comparable event) that informs the target that the source has done a
> flush...
> 
> Let me add one more level of complication then :-). Two network
> adapters
> + shared memory. Let's say the Put went over NIC1 and a mutex went over
> NIC2 (get-accumulate). Then a local process checks the mutex and wants
> to access the data written by the put.
> 
> Theoretically, there's no ordering here. Note that the two NICs can be
> on two different PCIe slots and cause corruption.

Now you are just trying to be difficult... First, your scenario is not legal.  You have to call a local MPI_Lock()/MPI_Unlock() before that data is visible in the private window to allow loads and stores.  Even accessing that item that was Put over NIC1 is undefined until the source has done a completion operation.  

Even then, I think you are discussing an ordering problem that exists in the base standard:  completing an MPI_Unlock() implies remote completing.  Real remote completion.  Until MPI_Unlock() completes, there is no guarantee of ordering between anything.  MPI_flush() does not add to this issue.

Keith




More information about the mpiwg-rma mailing list