[Mpi3-rma] MPI-3 UNIFIED model clarification

Mon Jul 29 15:05:17 CDT 2013

> I vehemently object to the definition of a new memory model that only
> works for x86 (actually a subset thereof since Intel SCC is x86 but not even
> cache-coherent), which is older than I am.  If we're going to make CPU-
> specific memory models for RMA, let's be fair and thorough and define ones
> that work for PPC, DEC Alpha, and any other CPU memory models that exist
> now or may in the future.
> 
> WIN_SYNC is the right way to do this and requires no changes to the
> standard and affects a very small number of users.  How many people are
> writing MPI-3 RMA code that aren't on this list?
> 
> Note also that I don't even think UBER_SUPER_UNIFIED is practical.
> How does x86's memory model solve the problem of a multi-rail NIC where
> the Send-Recv happens on rail 0 and the Put+Flush happens on the other?
> Are MPI implementers going to be required to serialize all communication
> through the NIC in order to support this model?
>

I agree with the sentiment, also with the implementation issue of UNIFIED with multi-rail. I'm not certain that a model exists in between UNIFIED and SEPARATE. Either you are able to observe changes in memory without further MPI calls, or not. Pavan do you have a specific model in mind?

I do not agree that mandating MPI_Win_sync requires no changes to the standard. UNIFIED is clearly defined in the current spec as the model in which public and private copies of the window are identical. MPI_Win_sync is also clearly defined as synchronizing the private and public copies of the window. Therefore, we cannot simply just require UNIFIED to call MPI_Win_sync as that would be meaningless.

Sayantan.

> Jeff
> 
> On Mon, Jul 29, 2013 at 1:41 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
> > Sayantan,
> >
> > Yes, doing a remote memory barrier using an active message is an option.
> > But I think the point is that any of these approaches adds more
> > unnecessary overhead.
> >
> > Separating into three memory models provides better flexibility to
> > implementations (and to applications, once we allow them to ask for
> > the "required" value through an info argument).
> >
> >  -- Pavan
> >
> >
> > On 07/29/2013 12:00 PM, Sur, Sayantan wrote:
> >>
> >> Hi Pavan,
> >>
> >>> Specifically, the concern was that some members of the WG believed
> >>> that in the UNIFIED model, data is usable by the remote process
> >>> after a PUT without an additional WIN_SYNC, while some members
> >>> believed that it is not.
> >>> Here's
> >>> the example in question:
> >>>
> >>> P0:
> >>> Win_lock_all
> >>> Put(a, P1)
> >>> Flush
> >>> MPI_Send(P1)
> >>>
> >>> P1:
> >>> Win_lock_all
> >>> MPI_Recv(P0)
> >>> read a
> >>>
> >>> The question was whether the above program was valid without a
> >>> WIN_SYNC on P1 between the Recv(P0) and "read a".  If we want this
> >>> to be valid in the UNIFIED model, only x86-like architectures can
> >>> provide UNIFIED efficiently.  Other architectures, such as PPC or
> >>> ARM, that require an additional read barrier on P1 will not be able
> >>> to provide UNIFIED even if they are cache-coherent, unless they add
> >>> a memory barrier in every other MPI call (e.g., MPI_Recv in this
> >>> case).
> >>>
> >>
> >> Adding a memory barrier in MPI_Recv is one of the implementation
> >> options, and probably not the best one. For relaxed memory
> >> architectures, one may want to shift the burden onto Flush to do a
> >> memory barrier after the data has been written (through an active
> message for example).
> >>
> >> The description of Flush in the spec is: "MPI_WIN_FLUSH completes all
> >> outstanding RMA operations initiated by the calling process to the
> >> target rank on the specified window. The operations are completed
> >> both at the origin and at the target."
> >>
> >> The way I read it, no further action should be required to view
> >> contents of the memory attached to the window after MPI_Win_flush.
> >> Therefore, an implementation of MPI_Win_flush needs to do whatever is
> >> required by the underlying platform and the model the MPI is supposed
> to provide.
> >>
> >>> One possible solution we discussed was to clarify that this is not
> >>> allowed in UNIFIED, but provide a third memory model called
> >>> UBER_SUPER_UNIFIED, that will allow this.  (or say that it is
> >>> allowed in UNIFIED and provide a third model called KIND_OF_UNIFIED,
> >>> which is in between UNIFIED and SEPARATE).
> >>>
> >>> Other solutions are welcome.
> >>>
> >>> Irrespective of when we make the change of possibly adding an
> >>> additional memory model (MPI-3.1, MPI-4, whatever), we should
> >>> clarify the standard on what is allowed in MPI-3 and what is not, as
> >>> an errata item.  Without that, it's confusing for implementors.
> >>>
> >>>    -- Pavan
> >>>
> >>> --
> >>> Pavan Balaji
> >>> http://www.mcs.anl.gov/~balaji
> >>> _______________________________________________
> >>> mpi3-rma mailing list
> >>> mpi3-rma at lists.mpi-forum.org
> >>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma
> >>
> >>
> >> _______________________________________________
> >> mpi3-rma mailing list
> >> mpi3-rma at lists.mpi-forum.org
> >> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma
> >>
> >
> > --
> > Pavan Balaji
> > http://www.mcs.anl.gov/~balaji
> > _______________________________________________
> > mpi3-rma mailing list
> > mpi3-rma at lists.mpi-forum.org
> > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma
> 
> 
> 
> --
> Jeff Hammond
> jeff.science at gmail.com
> _______________________________________________
> mpi3-rma mailing list
> mpi3-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma