[mpiwg-rma] [EXTERNAL] Re: Synchronization on shared memory windows

Balaji, Pavan balaji at anl.gov
Wed Feb 5 11:53:21 CST 2014


It’s correct now.

In the MPI standard, we have stayed away from tutorial-style material.  But I’m personally not against it.

  — Pavan

On Feb 5, 2014, at 11:43 AM, Rolf Rabenseifner <rabenseifner at hlrs.de> wrote:

> Pavan, Dave, Jeff, and Brian,
> 
> is it now correct?
> Should it go in MPI-next into Sect.11.7?
> 
>> --------------------
>> X is part of a shared memory window and should mean the same
>> memory location in both processes produced with
>> MPI_WIN_ALLOCATE_SHARED.
>> 
>> Process A               Process B
>> 
>> MPI_WIN_LOCK_ALL(       MPI_WIN_LOCK_ALL(
>> MPI_MODE_NOCHECK,win)   MPI_MODE_NOCHECK,win)
>> 
>> DO ...                  DO ...
>>  x=...
>>  MPI_F_SYNC_REG(X)
>>  MPI_WIN_SYNC(win)
>>  MPI_Barrier             MPI_Barrier
>>                          MPI_WIN_SYNC(win)
>>                          MPI_F_SYNC_REG(X)
>>                          print X
>> END DO                  END DO
>> 
>> MPI_WIN_UNLOCK_ALL(win) MPI_WIN_UNLOCK_ALL(win)
>> --------------------
> 
> Best regards
> Rolf
> 
> ----- Original Message -----
>> From: "Rolf Rabenseifner" <rabenseifner at hlrs.de>
>> To: "MPI WG Remote Memory Access working group" <mpiwg-rma at lists.mpi-forum.org>
>> Cc: "Stefan Andersson" <stefan at cray.com>, "Bill Long" <longb at cray.com>
>> Sent: Wednesday, February 5, 2014 5:59:03 PM
>> Subject: Re: [mpiwg-rma] [EXTERNAL]        Re:        Synchronization        on        shared        memory        windows
>> 
>> That was my fault. Here the program that should be examined,
>> whether it is correct according to MPI-3.0:
>> 
>> --------------------
>> X is part of a shared memory window and should mean the same
>> memory location in both processes produced with
>> MPI_WIN_ALLOCATE_SHARED.
>> 
>> Process A               Process B
>> 
>> MPI_WIN_LOCK_ALL(       MPI_WIN_LOCK_ALL(
>> MPI_MODE_NOCHECK,win)   MPI_MODE_NOCHECK,win)
>> 
>> DO ...                  DO ...
>>  x=...
>>  MPI_F_SYNC_REG(X)
>>  MPI_WIN_SYNC(win)
>>  MPI_Barrier             MPI_Barrier
>>                          MPI_WIN_SYNC(win)
>>                          MPI_F_SYNC_REG(X)
>>                          print X
>> END DO                  END DO
>> 
>> MPI_WIN_UNLOCK_ALL(win) MPI_WIN_UNLOCK_ALL(win)
>> --------------------
>> 
>> Is it now correct according to MPI-3.0?
>> And perhaps also according other rules for
>> real shared memory programming?
>> 
>> Would it be helpful to add it to at the end of Sect.11.7
>> because it would definitely clarify the rules
>> how to use shared memory windows.
>> 
>> Best regards
>> Rolf
>> 
>> ----- Original Message -----
>>> From: "Pavan Balaji" <balaji at anl.gov>
>>> To: "MPI WG Remote Memory Access working group"
>>> <mpiwg-rma at lists.mpi-forum.org>
>>> Cc: "Stefan Andersson" <stefan at cray.com>, "Bill Long"
>>> <longb at cray.com>
>>> Sent: Wednesday, February 5, 2014 5:46:25 PM
>>> Subject: Re: [mpiwg-rma] [EXTERNAL] Re:        Synchronization        on        shared
>>>        memory        windows
>>> 
>>> 
>>> Whoops, I read MPI_F_SYNC_REG as MPI_WIN_SYNC.  There need to be
>>> WIN_SYNCs on both processes.
>>> 
>>>  — Pavan
>>> 
>>> On Feb 5, 2014, at 10:40 AM, Dave Goodell (dgoodell)
>>> <dgoodell at cisco.com> wrote:
>>> 
>>>> Pavan, is it?
>>>> 
>>>> Rolf, here is the supposed MPI_WIN_SYNC call?  I assume you meant
>>>> to put it between the MPI_F_SYNC_REG and MPI_Barrier in both
>>>> processes?
>>>> 
>>>> -Dave
>>>> 
>>>> On Feb 5, 2014, at 10:31 AM, "Balaji, Pavan" <balaji at anl.gov>
>>>> wrote:
>>>> 
>>>>> 
>>>>> Yes, this is a correct program.
>>>>> 
>>>>> — Pavan
>>>>> 
>>>>> On Feb 5, 2014, at 10:30 AM, Rolf Rabenseifner
>>>>> <rabenseifner at hlrs.de> wrote:
>>>>> 
>>>>>> Jeff and all,
>>>>>> 
>>>>>> it looks like that it works as MPI-3 is designed:
>>>>>> 
>>>>>> I need to add once at begin a
>>>>>> MPI_WIN_LOCK_ALL(MPI_MODE_NOCHECK,
>>>>>> win)
>>>>>> and once at end a MPI_WIN_UNLOCK_ALL(win)
>>>>>> and then all works fine with MPI_WIN_SYNC in each iteration.
>>>>>> 
>>>>>> Is this usage consistent with the definition in the MPI-3
>>>>>> standard?
>>>>>> 
>>>>>> Here the total scenario that I use:
>>>>>> 
>>>>>> --------------------
>>>>>> X is part of a shared memory window and should mean the same
>>>>>> memory location in both processes
>>>>>> 
>>>>>> Process A               Process B
>>>>>> 
>>>>>> MPI_WIN_LOCK_ALL(       MPI_WIN_LOCK_ALL(
>>>>>> MPI_MODE_NOCHECK,win)   MPI_MODE_NOCHECK,win)
>>>>>> 
>>>>>> DO ...                  DO ...
>>>>>> x=...
>>>>>> MPI_F_SYNC_REG(X)
>>>>>> MPI_Barrier             MPI_Barrier
>>>>>>                       MPI_F_SYNC_REG(X)
>>>>>>                       print X
>>>>>> END DO                  END DO
>>>>>> 
>>>>>> MPI_WIN_UNLOCK_ALL(win) MPI_WIN_UNLOCK_ALL(win)
>>>>>> 
>>>>>> --------------------
>>>>>> 
>>>>>> Best regards
>>>>>> Rolf
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ----- Original Message -----
>>>>>>> From: "Jeff Hammond" <jeff.science at gmail.com>
>>>>>>> To: "MPI WG Remote Memory Access working group"
>>>>>>> <mpiwg-rma at lists.mpi-forum.org>
>>>>>>> Cc: "Stefan Andersson" <stefan at cray.com>, "Bill Long"
>>>>>>> <longb at cray.com>
>>>>>>> Sent: Tuesday, February 4, 2014 7:42:58 PM
>>>>>>> Subject: Re: [mpiwg-rma] [EXTERNAL] Re: Synchronization on
>>>>>>> shared memory        windows
>>>>>>> 
>>>>>>> "For the purposes of synchronizing the private and public
>>>>>>> window,
>>>>>>> MPI_WIN_SYNC has the effect of ending and reopening an access
>>>>>>> and
>>>>>>> exposure epoch on the window (note that it does not actually
>>>>>>> end
>>>>>>> an
>>>>>>> epoch or complete any pending MPI RMA operations)."
>>>>>>> 
>>>>>>> I think this is interpreted to mean that this call is only
>>>>>>> valid
>>>>>>> inside of an existing epoch and thus if you want to call it,
>>>>>>> you
>>>>>>> need
>>>>>>> to use it inside of a passive-target epoch.  Thus, it is not
>>>>>>> merely a
>>>>>>> portable abstraction for a memory barrier.
>>>>>>> 
>>>>>>> I think we should fix MPICH and/or MPI-Next to allow the more
>>>>>>> general
>>>>>>> use such that your code is standard-compliant and executes
>>>>>>> correctly.
>>>>>>> 
>>>>>>> I await violent disagreement from others :-)
>>>>>>> 
>>>>>>> Jeff
>>>>>>> 
>>>>>>> On Tue, Feb 4, 2014 at 12:34 PM, Rolf Rabenseifner
>>>>>>> <rabenseifner at hlrs.de> wrote:
>>>>>>>> Brian, Pavan, and Jeff,
>>>>>>>> 
>>>>>>>> you convinced me. I did it, see attached file, and my mpich
>>>>>>>> based
>>>>>>>> Cray lib tells
>>>>>>>> 
>>>>>>>> Rank 0 [Tue Feb  4 19:31:28 2014] [c9-1c2s7n0] Fatal error in
>>>>>>>> MPI_Win_sync: Wrong synchronization of RMA calls , error
>>>>>>>> stack:
>>>>>>>> MPI_Win_sync(113)...: MPI_Win_sync(win=0xa0000001) failed
>>>>>>>> MPIDI_Win_sync(2495): Wrong synchronization of RMA calls
>>>>>>>> 
>>>>>>>> (only once in each process).
>>>>>>>> 
>>>>>>>> I expect, that this is now an implementation bug that should
>>>>>>>> be
>>>>>>>> fixed by mpich and cray?
>>>>>>>> 
>>>>>>>> Best regards
>>>>>>>> Rolf
>>>>>>>> 
>>>>>>>> ----- Original Message -----
>>>>>>>>> From: "Brian W Barrett" <bwbarre at sandia.gov>
>>>>>>>>> To: "MPI WG Remote Memory Access working group"
>>>>>>>>> <mpiwg-rma at lists.mpi-forum.org>
>>>>>>>>> Cc: "Stefan Andersson" <stefan at cray.com>, "Bill Long"
>>>>>>>>> <longb at cray.com>
>>>>>>>>> Sent: Tuesday, February 4, 2014 7:09:02 PM
>>>>>>>>> Subject: Re: [mpiwg-rma] [EXTERNAL] Re: Synchronization on
>>>>>>>>> shared
>>>>>>>>> memory windows
>>>>>>>>> 
>>>>>>>>> On 2/4/14 11:01 AM, "Rolf Rabenseifner"
>>>>>>>>> <rabenseifner at hlrs.de>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> The MPI_WIN_SYNC (not the Fortran MPI_F_SYNC_REG)
>>>>>>>>>> has no meaning in the unified memory model if all accesses
>>>>>>>>>> are done without RMA routines.
>>>>>>>>>> It has only a meaning if different public and privat copy
>>>>>>>>>> is
>>>>>>>>>> there (MPI-3.0 p450:46-p451:2).
>>>>>>>>>> MPI-3.0 p456:3 - p457:7 define the rules for the unified
>>>>>>>>>> memory
>>>>>>>>>> model
>>>>>>>>>> but there is no need to use MPI_WIN_SYNC.
>>>>>>>>> 
>>>>>>>>> Right, there's no need from an MPI point of view, but that
>>>>>>>>> doesn't
>>>>>>>>> mean
>>>>>>>>> that the language/compiler/processor doesn't have a need for
>>>>>>>>> extra
>>>>>>>>> synchronization.
>>>>>>>>> 
>>>>>>>>>> The combination of X=13 and MPI_F_SYNC_REG(X)
>>>>>>>>>> before MPI_Barrier should guarantee that all bytes of X are
>>>>>>>>>> stored in memory. The same should be valid in C,
>>>>>>>>>> because the C compiler has no chance to see whether
>>>>>>>>>> MPI_Barrier will access the bytes of X or not.
>>>>>>>>>> And if it is guaranteed to be in the unified memory,
>>>>>>>>>> then the other process (B) should be able to correctly
>>>>>>>>>> read the data after the return from its barrier.
>>>>>>>>>> 
>>>>>>>>>> What is wrong with my thinking?
>>>>>>>>>> Which detail do I miss?
>>>>>>>>> 
>>>>>>>>> According to my reading of the spec, MPI_F_SYNC_REG only
>>>>>>>>> prevents
>>>>>>>>> the
>>>>>>>>> language/compiler from moving the store, but does not say
>>>>>>>>> anything
>>>>>>>>> about
>>>>>>>>> processor ordering.  So the WIN_SYNC in my last e-mail will
>>>>>>>>> add
>>>>>>>>> the
>>>>>>>>> processor memory barrier, which will give you all the
>>>>>>>>> semantics
>>>>>>>>> you
>>>>>>>>> need.
>>>>>>>>> 
>>>>>>>>> Shared memory programming is a disaster in most languages
>>>>>>>>> today,
>>>>>>>>> so
>>>>>>>>> we
>>>>>>>>> decided to pass that disaster on to the user.  We really
>>>>>>>>> can't
>>>>>>>>> help,
>>>>>>>>> without adding lots of overhead (ie, using put/get/rma
>>>>>>>>> synchronization).
>>>>>>>>> So if a user already knows how to do shared memory
>>>>>>>>> programming,
>>>>>>>>> this
>>>>>>>>> will
>>>>>>>>> feel natural.  If they don't, it's going to hurt badly :/.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Brian
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Brian W. Barrett
>>>>>>>>> Scalable System Software Group
>>>>>>>>> Sandia National Laboratories
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> _______________________________________________
>>>>>>>>> mpiwg-rma mailing list
>>>>>>>>> mpiwg-rma at lists.mpi-forum.org
>>>>>>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Dr. Rolf Rabenseifner . . . . . . . . . .. email
>>>>>>>> rabenseifner at hlrs.de
>>>>>>>> High Performance Computing Center (HLRS) . phone
>>>>>>>> ++49(0)711/685-65530
>>>>>>>> University of Stuttgart . . . . . . . . .. fax ++49(0)711 /
>>>>>>>> 685-65832
>>>>>>>> Head of Dpmt Parallel Computing . . .
>>>>>>>> www.hlrs.de/people/rabenseifner
>>>>>>>> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office:
>>>>>>>> Room
>>>>>>>> 1.307)
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> mpiwg-rma mailing list
>>>>>>>> mpiwg-rma at lists.mpi-forum.org
>>>>>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Jeff Hammond
>>>>>>> jeff.science at gmail.com
>>>>>>> _______________________________________________
>>>>>>> mpiwg-rma mailing list
>>>>>>> mpiwg-rma at lists.mpi-forum.org
>>>>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Dr. Rolf Rabenseifner . . . . . . . . . .. email
>>>>>> rabenseifner at hlrs.de
>>>>>> High Performance Computing Center (HLRS) . phone
>>>>>> ++49(0)711/685-65530
>>>>>> University of Stuttgart . . . . . . . . .. fax ++49(0)711 /
>>>>>> 685-65832
>>>>>> Head of Dpmt Parallel Computing . . .
>>>>>> www.hlrs.de/people/rabenseifner
>>>>>> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room
>>>>>> 1.307)
>>>>>> _______________________________________________
>>>>>> mpiwg-rma mailing list
>>>>>> mpiwg-rma at lists.mpi-forum.org
>>>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>>>> 
>>>>> _______________________________________________
>>>>> mpiwg-rma mailing list
>>>>> mpiwg-rma at lists.mpi-forum.org
>>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>>> 
>>>> _______________________________________________
>>>> mpiwg-rma mailing list
>>>> mpiwg-rma at lists.mpi-forum.org
>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>> 
>>> _______________________________________________
>>> mpiwg-rma mailing list
>>> mpiwg-rma at lists.mpi-forum.org
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>> 
>> 
>> --
>> Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
>> High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
>> University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
>> Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
>> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307)
>> _______________________________________________
>> mpiwg-rma mailing list
>> mpiwg-rma at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> 
> -- 
> Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
> High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
> University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
> Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307)
> _______________________________________________
> mpiwg-rma mailing list
> mpiwg-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma




More information about the mpiwg-rma mailing list