[Mpi3-rma] MPI-3 UNIFIED model clarification

Mon Jul 29 14:27:15 CDT 2013

> Yes, doing a remote memory barrier using an active message is an option.
>   But I think the point is that any of these approaches adds more unnecessary
> overhead.
>

I'm not clear as to for which option we can avoid overhead of the barrier.

UBER_SUPER_UNIFIED: MPI_Win_flush is truly one sided and utilizes network remote completion.

KIND_OF_UNIFIED: You need to have the memory barrier somewhere, so you will move it to MPI_Win_sync. A well-optimized RMA app will avoid unnecessary flushes and only do so when it intends updates to be visible to the remote side. I think in the end, the cost of the barrier will end up being equal, since a well optimized app will synchronize (local or remote) as infrequently as possible.

> Separating into three memory models provides better flexibility to
> implementations (and to applications, once we allow them to ask for the
> "required" value through an info argument).
>

I have a suspicion that KIND_OF_UNIFIED is going to end up being no different than SEPARATE. Although I haven't thought this angle through. It needs further clarification as to how KIND_OF_UNIFIED differs from UBER_SUPER_UNIFIED.

What this discussion brings to light is a lack of common understanding of the following comment on Pg 436, line 37-39:

" In the RMA unified model, public and private copies are identical and updates via put
or accumulate calls are eventually observed by load operations without additional RMA
calls."

I take "eventually" to mean locks/flushes. But it seems that not everyone shares that interpretation? 

Thanks,
Sayantan.

>   -- Pavan
> 
> On 07/29/2013 12:00 PM, Sur, Sayantan wrote:
> > Hi Pavan,
> >
> >> Specifically, the concern was that some members of the WG believed
> >> that in the UNIFIED model, data is usable by the remote process after
> >> a PUT without an additional WIN_SYNC, while some members believed
> >> that it is not.  Here's the example in question:
> >>
> >> P0:
> >> Win_lock_all
> >> Put(a, P1)
> >> Flush
> >> MPI_Send(P1)
> >>
> >> P1:
> >> Win_lock_all
> >> MPI_Recv(P0)
> >> read a
> >>
> >> The question was whether the above program was valid without a
> >> WIN_SYNC on P1 between the Recv(P0) and "read a".  If we want this to
> >> be valid in the UNIFIED model, only x86-like architectures can
> >> provide UNIFIED efficiently.  Other architectures, such as PPC or
> >> ARM, that require an additional read barrier on P1 will not be able
> >> to provide UNIFIED even if they are cache-coherent, unless they add a
> >> memory barrier in every other MPI call (e.g., MPI_Recv in this case).
> >>
> >
> > Adding a memory barrier in MPI_Recv is one of the implementation
> options, and probably not the best one. For relaxed memory architectures,
> one may want to shift the burden onto Flush to do a memory barrier after
> the data has been written (through an active message for example).
> >
> > The description of Flush in the spec is: "MPI_WIN_FLUSH completes all
> outstanding RMA operations initiated by the calling process to the target rank
> on the specified window. The operations are completed both at the origin
> and at the target."
> >
> > The way I read it, no further action should be required to view contents of
> the memory attached to the window after MPI_Win_flush. Therefore, an
> implementation of MPI_Win_flush needs to do whatever is required by the
> underlying platform and the model the MPI is supposed to provide.
> >
> >> One possible solution we discussed was to clarify that this is not
> >> allowed in UNIFIED, but provide a third memory model called
> >> UBER_SUPER_UNIFIED, that will allow this.  (or say that it is allowed
> >> in UNIFIED and provide a third model called KIND_OF_UNIFIED, which is
> >> in between UNIFIED and SEPARATE).
> >>
> >> Other solutions are welcome.
> >>
> >> Irrespective of when we make the change of possibly adding an
> >> additional memory model (MPI-3.1, MPI-4, whatever), we should clarify
> >> the standard on what is allowed in MPI-3 and what is not, as an
> >> errata item.  Without that, it's confusing for implementors.
> >>
> >>    -- Pavan
> >>
> >> --
> >> Pavan Balaji
> >> http://www.mcs.anl.gov/~balaji
> >> _______________________________________________
> >> mpi3-rma mailing list
> >> mpi3-rma at lists.mpi-forum.org
> >> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma
> >
> > _______________________________________________
> > mpi3-rma mailing list
> > mpi3-rma at lists.mpi-forum.org
> > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma
> >
> 
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji