[Mpi3-rma] MPI-3 UNIFIED model clarification

Mon Jul 29 23:10:29 CDT 2013

On 07/29/2013 10:58 PM, Underwood, Keith D wrote:
>> P1:
>> Win_lock_all
 >> MPI_Recv(P0, flag)
>> read a
>
> Can you point to the line in there that tells the architecture that
> the read of A is not touched by MPI_Recv?  Let's say the data
> movement for MPI_Recv is done by a NIC or done by another core.  How
> can the microarchitecture tell the difference?

'flag' and 'a' are two different buffers, so presumably reordering 
MPI_RECV and 'read a' should be permitted.

Are you saying that the architecture cannot track that these two are 
nonoverlapping buffers?

Or are you saying that the "poll completion queue" equivalent for the 
network receive should already be doing a memory barrier for a correct 
network stack anyway?

If it's the latter, then let's consider the following new example:

P0:
Barrier
Win_lock_all(win1)
Put(a, P1)
Flush
flag = 1;
Put(flag, P1)

P1:
flag = 0;
Barrier
Win_lock_all
while (flag);
read a

Remember MPI-3's nasty semantics of single-byte flags turning to 
non-zero and staying there :-).

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji