[Mpi3-rma] MPI-3 UNIFIED model clarification

Mon Jul 29 15:37:24 CDT 2013

On 07/29/2013 02:27 PM, Sur, Sayantan wrote:
>> Yes, doing a remote memory barrier using an active message is an option.
>>    But I think the point is that any of these approaches adds more unnecessary
>> overhead.
>
> I'm not clear as to for which option we can avoid overhead of the barrier.
>
> UBER_SUPER_UNIFIED: MPI_Win_flush is truly one sided and utilizes network remote completion.
>
> KIND_OF_UNIFIED: You need to have the memory barrier somewhere, so you will move it to MPI_Win_sync. A well-optimized RMA app will avoid unnecessary flushes and only do so when it intends updates to be visible to the remote side. I think in the end, the cost of the barrier will end up being equal, since a well optimized app will synchronize (local or remote) as infrequently as possible.

I'm not sure I follow your point here.  For non-x86 to provide the 
currently specified UNIFIED model, it has to do active-message based 
remote memory barriers for every flush and unlock (even if the target 
process is not planning to read the data).  With KIND_OF_UNIFIED, I'm 
saying that the user will need to WIN_SYNC, thus avoiding this overhead 
in flush.  MPI will do the memory barrier in WIN_SYNC, when the user 
wants it.

> I have a suspicion that KIND_OF_UNIFIED is going to end up being no
> different than SEPARATE. Although I haven't thought this angle
> through. It needs further clarification as to how KIND_OF_UNIFIED
> differs from UBER_SUPER_UNIFIED.

That was my first thought as well, but note that KIND_OF_UNIFIED is much 
stronger (and almost identical to UNIFIED), when we are talking about 
nonoverlapping memory.

> " In the RMA unified model, public and private copies are identical and updates via put
> or accumulate calls are eventually observed by load operations without additional RMA
> calls."
>
> I take "eventually" to mean locks/flushes. But it seems that not everyone shares that interpretation?

FLUSH will guarantee that the data is present in the remote public 
memory.  And UNIFIED states that public memory == private memory.  So, 
the user can read it directly without any additional MPI calls.

That's the broken part.

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji