[mpiwg-rma] [EXTERNAL] Re: Synchronization on shared memory windows

Jeff Hammond jeff.science at gmail.com
Wed Feb 5 11:54:38 CST 2014


I think advice to users regarding WIN_SYNC wouldn't hurt given that
some of us are talking about it as if it is a portable abstraction for
a memory barrier when it is now only correct to use it within an RMA
sync epoch.  Or we need to relax this restriction.

Jeff

On Wed, Feb 5, 2014 at 11:53 AM, Balaji, Pavan <balaji at anl.gov> wrote:
>
> It’s correct now.
>
> In the MPI standard, we have stayed away from tutorial-style material.  But I’m personally not against it.
>
>   — Pavan
>
> On Feb 5, 2014, at 11:43 AM, Rolf Rabenseifner <rabenseifner at hlrs.de> wrote:
>
>> Pavan, Dave, Jeff, and Brian,
>>
>> is it now correct?
>> Should it go in MPI-next into Sect.11.7?
>>
>>> --------------------
>>> X is part of a shared memory window and should mean the same
>>> memory location in both processes produced with
>>> MPI_WIN_ALLOCATE_SHARED.
>>>
>>> Process A               Process B
>>>
>>> MPI_WIN_LOCK_ALL(       MPI_WIN_LOCK_ALL(
>>> MPI_MODE_NOCHECK,win)   MPI_MODE_NOCHECK,win)
>>>
>>> DO ...                  DO ...
>>>  x=...
>>>  MPI_F_SYNC_REG(X)
>>>  MPI_WIN_SYNC(win)
>>>  MPI_Barrier             MPI_Barrier
>>>                          MPI_WIN_SYNC(win)
>>>                          MPI_F_SYNC_REG(X)
>>>                          print X
>>> END DO                  END DO
>>>
>>> MPI_WIN_UNLOCK_ALL(win) MPI_WIN_UNLOCK_ALL(win)
>>> --------------------
>>
>> Best regards
>> Rolf
>>
>> ----- Original Message -----
>>> From: "Rolf Rabenseifner" <rabenseifner at hlrs.de>
>>> To: "MPI WG Remote Memory Access working group" <mpiwg-rma at lists.mpi-forum.org>
>>> Cc: "Stefan Andersson" <stefan at cray.com>, "Bill Long" <longb at cray.com>
>>> Sent: Wednesday, February 5, 2014 5:59:03 PM
>>> Subject: Re: [mpiwg-rma] [EXTERNAL]        Re:        Synchronization        on        shared        memory        windows
>>>
>>> That was my fault. Here the program that should be examined,
>>> whether it is correct according to MPI-3.0:
>>>
>>> --------------------
>>> X is part of a shared memory window and should mean the same
>>> memory location in both processes produced with
>>> MPI_WIN_ALLOCATE_SHARED.
>>>
>>> Process A               Process B
>>>
>>> MPI_WIN_LOCK_ALL(       MPI_WIN_LOCK_ALL(
>>> MPI_MODE_NOCHECK,win)   MPI_MODE_NOCHECK,win)
>>>
>>> DO ...                  DO ...
>>>  x=...
>>>  MPI_F_SYNC_REG(X)
>>>  MPI_WIN_SYNC(win)
>>>  MPI_Barrier             MPI_Barrier
>>>                          MPI_WIN_SYNC(win)
>>>                          MPI_F_SYNC_REG(X)
>>>                          print X
>>> END DO                  END DO
>>>
>>> MPI_WIN_UNLOCK_ALL(win) MPI_WIN_UNLOCK_ALL(win)
>>> --------------------
>>>
>>> Is it now correct according to MPI-3.0?
>>> And perhaps also according other rules for
>>> real shared memory programming?
>>>
>>> Would it be helpful to add it to at the end of Sect.11.7
>>> because it would definitely clarify the rules
>>> how to use shared memory windows.
>>>
>>> Best regards
>>> Rolf
>>>
>>> ----- Original Message -----
>>>> From: "Pavan Balaji" <balaji at anl.gov>
>>>> To: "MPI WG Remote Memory Access working group"
>>>> <mpiwg-rma at lists.mpi-forum.org>
>>>> Cc: "Stefan Andersson" <stefan at cray.com>, "Bill Long"
>>>> <longb at cray.com>
>>>> Sent: Wednesday, February 5, 2014 5:46:25 PM
>>>> Subject: Re: [mpiwg-rma] [EXTERNAL] Re:        Synchronization        on        shared
>>>>        memory        windows
>>>>
>>>>
>>>> Whoops, I read MPI_F_SYNC_REG as MPI_WIN_SYNC.  There need to be
>>>> WIN_SYNCs on both processes.
>>>>
>>>>  — Pavan
>>>>
>>>> On Feb 5, 2014, at 10:40 AM, Dave Goodell (dgoodell)
>>>> <dgoodell at cisco.com> wrote:
>>>>
>>>>> Pavan, is it?
>>>>>
>>>>> Rolf, here is the supposed MPI_WIN_SYNC call?  I assume you meant
>>>>> to put it between the MPI_F_SYNC_REG and MPI_Barrier in both
>>>>> processes?
>>>>>
>>>>> -Dave
>>>>>
>>>>> On Feb 5, 2014, at 10:31 AM, "Balaji, Pavan" <balaji at anl.gov>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> Yes, this is a correct program.
>>>>>>
>>>>>> — Pavan
>>>>>>
>>>>>> On Feb 5, 2014, at 10:30 AM, Rolf Rabenseifner
>>>>>> <rabenseifner at hlrs.de> wrote:
>>>>>>
>>>>>>> Jeff and all,
>>>>>>>
>>>>>>> it looks like that it works as MPI-3 is designed:
>>>>>>>
>>>>>>> I need to add once at begin a
>>>>>>> MPI_WIN_LOCK_ALL(MPI_MODE_NOCHECK,
>>>>>>> win)
>>>>>>> and once at end a MPI_WIN_UNLOCK_ALL(win)
>>>>>>> and then all works fine with MPI_WIN_SYNC in each iteration.
>>>>>>>
>>>>>>> Is this usage consistent with the definition in the MPI-3
>>>>>>> standard?
>>>>>>>
>>>>>>> Here the total scenario that I use:
>>>>>>>
>>>>>>> --------------------
>>>>>>> X is part of a shared memory window and should mean the same
>>>>>>> memory location in both processes
>>>>>>>
>>>>>>> Process A               Process B
>>>>>>>
>>>>>>> MPI_WIN_LOCK_ALL(       MPI_WIN_LOCK_ALL(
>>>>>>> MPI_MODE_NOCHECK,win)   MPI_MODE_NOCHECK,win)
>>>>>>>
>>>>>>> DO ...                  DO ...
>>>>>>> x=...
>>>>>>> MPI_F_SYNC_REG(X)
>>>>>>> MPI_Barrier             MPI_Barrier
>>>>>>>                       MPI_F_SYNC_REG(X)
>>>>>>>                       print X
>>>>>>> END DO                  END DO
>>>>>>>
>>>>>>> MPI_WIN_UNLOCK_ALL(win) MPI_WIN_UNLOCK_ALL(win)
>>>>>>>
>>>>>>> --------------------
>>>>>>>
>>>>>>> Best regards
>>>>>>> Rolf
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ----- Original Message -----
>>>>>>>> From: "Jeff Hammond" <jeff.science at gmail.com>
>>>>>>>> To: "MPI WG Remote Memory Access working group"
>>>>>>>> <mpiwg-rma at lists.mpi-forum.org>
>>>>>>>> Cc: "Stefan Andersson" <stefan at cray.com>, "Bill Long"
>>>>>>>> <longb at cray.com>
>>>>>>>> Sent: Tuesday, February 4, 2014 7:42:58 PM
>>>>>>>> Subject: Re: [mpiwg-rma] [EXTERNAL] Re: Synchronization on
>>>>>>>> shared memory        windows
>>>>>>>>
>>>>>>>> "For the purposes of synchronizing the private and public
>>>>>>>> window,
>>>>>>>> MPI_WIN_SYNC has the effect of ending and reopening an access
>>>>>>>> and
>>>>>>>> exposure epoch on the window (note that it does not actually
>>>>>>>> end
>>>>>>>> an
>>>>>>>> epoch or complete any pending MPI RMA operations)."
>>>>>>>>
>>>>>>>> I think this is interpreted to mean that this call is only
>>>>>>>> valid
>>>>>>>> inside of an existing epoch and thus if you want to call it,
>>>>>>>> you
>>>>>>>> need
>>>>>>>> to use it inside of a passive-target epoch.  Thus, it is not
>>>>>>>> merely a
>>>>>>>> portable abstraction for a memory barrier.
>>>>>>>>
>>>>>>>> I think we should fix MPICH and/or MPI-Next to allow the more
>>>>>>>> general
>>>>>>>> use such that your code is standard-compliant and executes
>>>>>>>> correctly.
>>>>>>>>
>>>>>>>> I await violent disagreement from others :-)
>>>>>>>>
>>>>>>>> Jeff
>>>>>>>>
>>>>>>>> On Tue, Feb 4, 2014 at 12:34 PM, Rolf Rabenseifner
>>>>>>>> <rabenseifner at hlrs.de> wrote:
>>>>>>>>> Brian, Pavan, and Jeff,
>>>>>>>>>
>>>>>>>>> you convinced me. I did it, see attached file, and my mpich
>>>>>>>>> based
>>>>>>>>> Cray lib tells
>>>>>>>>>
>>>>>>>>> Rank 0 [Tue Feb  4 19:31:28 2014] [c9-1c2s7n0] Fatal error in
>>>>>>>>> MPI_Win_sync: Wrong synchronization of RMA calls , error
>>>>>>>>> stack:
>>>>>>>>> MPI_Win_sync(113)...: MPI_Win_sync(win=0xa0000001) failed
>>>>>>>>> MPIDI_Win_sync(2495): Wrong synchronization of RMA calls
>>>>>>>>>
>>>>>>>>> (only once in each process).
>>>>>>>>>
>>>>>>>>> I expect, that this is now an implementation bug that should
>>>>>>>>> be
>>>>>>>>> fixed by mpich and cray?
>>>>>>>>>
>>>>>>>>> Best regards
>>>>>>>>> Rolf
>>>>>>>>>
>>>>>>>>> ----- Original Message -----
>>>>>>>>>> From: "Brian W Barrett" <bwbarre at sandia.gov>
>>>>>>>>>> To: "MPI WG Remote Memory Access working group"
>>>>>>>>>> <mpiwg-rma at lists.mpi-forum.org>
>>>>>>>>>> Cc: "Stefan Andersson" <stefan at cray.com>, "Bill Long"
>>>>>>>>>> <longb at cray.com>
>>>>>>>>>> Sent: Tuesday, February 4, 2014 7:09:02 PM
>>>>>>>>>> Subject: Re: [mpiwg-rma] [EXTERNAL] Re: Synchronization on
>>>>>>>>>> shared
>>>>>>>>>> memory windows
>>>>>>>>>>
>>>>>>>>>> On 2/4/14 11:01 AM, "Rolf Rabenseifner"
>>>>>>>>>> <rabenseifner at hlrs.de>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> The MPI_WIN_SYNC (not the Fortran MPI_F_SYNC_REG)
>>>>>>>>>>> has no meaning in the unified memory model if all accesses
>>>>>>>>>>> are done without RMA routines.
>>>>>>>>>>> It has only a meaning if different public and privat copy
>>>>>>>>>>> is
>>>>>>>>>>> there (MPI-3.0 p450:46-p451:2).
>>>>>>>>>>> MPI-3.0 p456:3 - p457:7 define the rules for the unified
>>>>>>>>>>> memory
>>>>>>>>>>> model
>>>>>>>>>>> but there is no need to use MPI_WIN_SYNC.
>>>>>>>>>>
>>>>>>>>>> Right, there's no need from an MPI point of view, but that
>>>>>>>>>> doesn't
>>>>>>>>>> mean
>>>>>>>>>> that the language/compiler/processor doesn't have a need for
>>>>>>>>>> extra
>>>>>>>>>> synchronization.
>>>>>>>>>>
>>>>>>>>>>> The combination of X=13 and MPI_F_SYNC_REG(X)
>>>>>>>>>>> before MPI_Barrier should guarantee that all bytes of X are
>>>>>>>>>>> stored in memory. The same should be valid in C,
>>>>>>>>>>> because the C compiler has no chance to see whether
>>>>>>>>>>> MPI_Barrier will access the bytes of X or not.
>>>>>>>>>>> And if it is guaranteed to be in the unified memory,
>>>>>>>>>>> then the other process (B) should be able to correctly
>>>>>>>>>>> read the data after the return from its barrier.
>>>>>>>>>>>
>>>>>>>>>>> What is wrong with my thinking?
>>>>>>>>>>> Which detail do I miss?
>>>>>>>>>>
>>>>>>>>>> According to my reading of the spec, MPI_F_SYNC_REG only
>>>>>>>>>> prevents
>>>>>>>>>> the
>>>>>>>>>> language/compiler from moving the store, but does not say
>>>>>>>>>> anything
>>>>>>>>>> about
>>>>>>>>>> processor ordering.  So the WIN_SYNC in my last e-mail will
>>>>>>>>>> add
>>>>>>>>>> the
>>>>>>>>>> processor memory barrier, which will give you all the
>>>>>>>>>> semantics
>>>>>>>>>> you
>>>>>>>>>> need.
>>>>>>>>>>
>>>>>>>>>> Shared memory programming is a disaster in most languages
>>>>>>>>>> today,
>>>>>>>>>> so
>>>>>>>>>> we
>>>>>>>>>> decided to pass that disaster on to the user.  We really
>>>>>>>>>> can't
>>>>>>>>>> help,
>>>>>>>>>> without adding lots of overhead (ie, using put/get/rma
>>>>>>>>>> synchronization).
>>>>>>>>>> So if a user already knows how to do shared memory
>>>>>>>>>> programming,
>>>>>>>>>> this
>>>>>>>>>> will
>>>>>>>>>> feel natural.  If they don't, it's going to hurt badly :/.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Brian
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Brian W. Barrett
>>>>>>>>>> Scalable System Software Group
>>>>>>>>>> Sandia National Laboratories
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> mpiwg-rma mailing list
>>>>>>>>>> mpiwg-rma at lists.mpi-forum.org
>>>>>>>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Dr. Rolf Rabenseifner . . . . . . . . . .. email
>>>>>>>>> rabenseifner at hlrs.de
>>>>>>>>> High Performance Computing Center (HLRS) . phone
>>>>>>>>> ++49(0)711/685-65530
>>>>>>>>> University of Stuttgart . . . . . . . . .. fax ++49(0)711 /
>>>>>>>>> 685-65832
>>>>>>>>> Head of Dpmt Parallel Computing . . .
>>>>>>>>> www.hlrs.de/people/rabenseifner
>>>>>>>>> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office:
>>>>>>>>> Room
>>>>>>>>> 1.307)
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> mpiwg-rma mailing list
>>>>>>>>> mpiwg-rma at lists.mpi-forum.org
>>>>>>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Jeff Hammond
>>>>>>>> jeff.science at gmail.com
>>>>>>>> _______________________________________________
>>>>>>>> mpiwg-rma mailing list
>>>>>>>> mpiwg-rma at lists.mpi-forum.org
>>>>>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Dr. Rolf Rabenseifner . . . . . . . . . .. email
>>>>>>> rabenseifner at hlrs.de
>>>>>>> High Performance Computing Center (HLRS) . phone
>>>>>>> ++49(0)711/685-65530
>>>>>>> University of Stuttgart . . . . . . . . .. fax ++49(0)711 /
>>>>>>> 685-65832
>>>>>>> Head of Dpmt Parallel Computing . . .
>>>>>>> www.hlrs.de/people/rabenseifner
>>>>>>> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room
>>>>>>> 1.307)
>>>>>>> _______________________________________________
>>>>>>> mpiwg-rma mailing list
>>>>>>> mpiwg-rma at lists.mpi-forum.org
>>>>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>>>>>
>>>>>> _______________________________________________
>>>>>> mpiwg-rma mailing list
>>>>>> mpiwg-rma at lists.mpi-forum.org
>>>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>>>>
>>>>> _______________________________________________
>>>>> mpiwg-rma mailing list
>>>>> mpiwg-rma at lists.mpi-forum.org
>>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>>>
>>>> _______________________________________________
>>>> mpiwg-rma mailing list
>>>> mpiwg-rma at lists.mpi-forum.org
>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>>>
>>>
>>> --
>>> Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
>>> High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
>>> University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
>>> Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
>>> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307)
>>> _______________________________________________
>>> mpiwg-rma mailing list
>>> mpiwg-rma at lists.mpi-forum.org
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>
>> --
>> Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
>> High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
>> University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
>> Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
>> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307)
>> _______________________________________________
>> mpiwg-rma mailing list
>> mpiwg-rma at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>
> _______________________________________________
> mpiwg-rma mailing list
> mpiwg-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma



-- 
Jeff Hammond
jeff.science at gmail.com



More information about the mpiwg-rma mailing list