[Mpi3-rma] Memory barriers in MPI_WIN_LOCK_ALL mode

Tue Oct 30 07:44:18 CDT 2012

How is this different than doing store and load on a shared memory system? 

Bill

William Gropp
Director, Parallel Computing Institute
Deputy Director for Research
Institute for Advanced Computing Applications and Technologies
Paul and Cynthia Saylor Professor of Computer Science
University of Illinois Urbana-Champaign

On Oct 29, 2012, at 11:18 PM, Pavan Balaji wrote:

> 
> Keith pointed out offline that I forgot a FLUSH after the PUT on P0. Please do consider that added for your comments.
> 
> -- Pavan
> 
> On 10/29/2012 11:00 PM, Pavan Balaji wrote:
>> 
>> It's undefined without additional synchronization.  I'm doing additional
>> synchronization in the example below.
>> 
>>  -- Pavan
>> 
>> On 10/29/2012 10:59 PM, Underwood, Keith D wrote:
>>> Doesn't that specific usage have undefined results?  (put/get
>>> targeting the same location during one epoch?)
>>> 
>>>> -----Original Message-----
>>>> From: mpi3-rma-bounces at lists.mpi-forum.org [mailto:mpi3-rma-
>>>> bounces at lists.mpi-forum.org] On Behalf Of Pavan Balaji
>>>> Sent: Monday, October 29, 2012 11:39 PM
>>>> To: mpi3-rma at lists.mpi-forum.org
>>>> Subject: [Mpi3-rma] Memory barriers in MPI_WIN_LOCK_ALL mode
>>>> 
>>>> 
>>>> Consider the following program:
>>>> 
>>>> P0:
>>>>    MPI_WIN_LOCK_ALL();
>>>>    MPI_PUT(1, X, P1); /* Write 1 to variable X on P1 */
>>>>    /* wave hand to P1 */
>>>> 
>>>> P1:
>>>>    MPI_WIN_LOCK_ALL();
>>>>    /* wait for P0 to wave hand */
>>>>    MPI_GET(X, P1);  /* local operation */
>>>> 
>>>> If the MPI_GET on P1 does a local load operation internally, it is
>>>> not memory
>>>> consistent without a memory barrier.  Does this mean that I need to
>>>> always
>>>> do a memory barrier on all local GET operations to get the right value?
>>>> 
>>>> Note that this inefficiency does not go away by replacing the waving
>>>> of the
>>>> hand with an MPI_BARRIER for example, as it does not know which window,
>>>> if any, the synchronization is for.
>>>> 
>>>> Note that this is likely only a theoretical exercise, since most
>>>> (all?) compilers
>>>> will do a memory barrier anyway if they see a function call (MPI_GET
>>>> in this
>>>> case).  But is MPI assuming that that's going to be the case for
>>>> efficient
>>>> execution?
>>>> 
>>>>   -- Pavan
>>>> 
>>>> --
>>>> Pavan Balaji
>>>> http://www.mcs.anl.gov/~balaji
>>>> _______________________________________________
>>>> mpi3-rma mailing list
>>>> mpi3-rma at lists.mpi-forum.org
>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma
>>> 
>>> _______________________________________________
>>> mpi3-rma mailing list
>>> mpi3-rma at lists.mpi-forum.org
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma
>>> 
>> 
> 
> -- 
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
> _______________________________________________
> mpi3-rma mailing list
> mpi3-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma