[Mpi3-rma] MPI-3 UNIFIED model clarification

Jeff Hammond jeff.science at gmail.com
Mon Jul 29 13:58:10 CDT 2013

I vehemently object to the definition of a new memory model that only
works for x86 (actually a subset thereof since Intel SCC is x86 but
not even cache-coherent), which is older than I am.  If we're going to
make CPU-specific memory models for RMA, let's be fair and thorough
and define ones that work for PPC, DEC Alpha, and any other CPU memory
models that exist now or may in the future.

WIN_SYNC is the right way to do this and requires no changes to the
standard and affects a very small number of users.  How many people
are writing MPI-3 RMA code that aren't on this list?

Note also that I don't even think UBER_SUPER_UNIFIED is practical.
How does x86's memory model solve the problem of a multi-rail NIC
where the Send-Recv happens on rail 0 and the Put+Flush happens on the
other?  Are MPI implementers going to be required to serialize all
communication through the NIC in order to support this model?


On Mon, Jul 29, 2013 at 1:41 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
> Sayantan,
> Yes, doing a remote memory barrier using an active message is an option.
> But I think the point is that any of these approaches adds more unnecessary
> overhead.
> Separating into three memory models provides better flexibility to
> implementations (and to applications, once we allow them to ask for the
> "required" value through an info argument).
>  -- Pavan
> On 07/29/2013 12:00 PM, Sur, Sayantan wrote:
>> Hi Pavan,
>>> Specifically, the concern was that some members of the WG believed that
>>> in
>>> the UNIFIED model, data is usable by the remote process after a PUT
>>> without
>>> an additional WIN_SYNC, while some members believed that it is not.
>>> Here's
>>> the example in question:
>>> P0:
>>> Win_lock_all
>>> Put(a, P1)
>>> Flush
>>> MPI_Send(P1)
>>> P1:
>>> Win_lock_all
>>> MPI_Recv(P0)
>>> read a
>>> The question was whether the above program was valid without a
>>> WIN_SYNC on P1 between the Recv(P0) and "read a".  If we want this to be
>>> valid in the UNIFIED model, only x86-like architectures can provide
>>> efficiently.  Other architectures, such as PPC or ARM, that require an
>>> additional read barrier on P1 will not be able to provide UNIFIED even if
>>> they
>>> are cache-coherent, unless they add a memory barrier in every other MPI
>>> call
>>> (e.g., MPI_Recv in this case).
>> Adding a memory barrier in MPI_Recv is one of the implementation options,
>> and probably not the best one. For relaxed memory architectures, one may
>> want to shift the burden onto Flush to do a memory barrier after the data
>> has been written (through an active message for example).
>> The description of Flush in the spec is: "MPI_WIN_FLUSH completes all
>> outstanding RMA operations initiated by the calling process to the target
>> rank on the specified window. The operations are completed both at the
>> origin and at the target."
>> The way I read it, no further action should be required to view contents
>> of the memory attached to the window after MPI_Win_flush. Therefore, an
>> implementation of MPI_Win_flush needs to do whatever is required by the
>> underlying platform and the model the MPI is supposed to provide.
>>> One possible solution we discussed was to clarify that this is not
>>> allowed in
>>> UNIFIED, but provide a third memory model called UBER_SUPER_UNIFIED,
>>> that will allow this.  (or say that it is allowed in UNIFIED and provide
>>> a third
>>> model called KIND_OF_UNIFIED, which is in between UNIFIED and
>>> Other solutions are welcome.
>>> Irrespective of when we make the change of possibly adding an additional
>>> memory model (MPI-3.1, MPI-4, whatever), we should clarify the standard
>>> on what is allowed in MPI-3 and what is not, as an errata item.  Without
>>> that,
>>> it's confusing for implementors.
>>>    -- Pavan
>>> --
>>> Pavan Balaji
>>> http://www.mcs.anl.gov/~balaji
>>> _______________________________________________
>>> mpi3-rma mailing list
>>> mpi3-rma at lists.mpi-forum.org
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma
>> _______________________________________________
>> mpi3-rma mailing list
>> mpi3-rma at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
> _______________________________________________
> mpi3-rma mailing list
> mpi3-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma

Jeff Hammond
jeff.science at gmail.com

More information about the mpiwg-rma mailing list