[mpiwg-rma] [EXTERNAL] Re: Synchronization on shared memory windows

Rolf Rabenseifner rabenseifner at hlrs.de
Wed Feb 5 11:43:23 CST 2014


Pavan, Dave, Jeff, and Brian,

is it now correct?
Should it go in MPI-next into Sect.11.7?

> --------------------
> X is part of a shared memory window and should mean the same
> memory location in both processes produced with
> MPI_WIN_ALLOCATE_SHARED.
> 
> Process A               Process B
> 
> MPI_WIN_LOCK_ALL(       MPI_WIN_LOCK_ALL(
> MPI_MODE_NOCHECK,win)   MPI_MODE_NOCHECK,win)
> 
> DO ...                  DO ...
>   x=...
>   MPI_F_SYNC_REG(X)
>   MPI_WIN_SYNC(win)
>   MPI_Barrier             MPI_Barrier
>                           MPI_WIN_SYNC(win)
>                           MPI_F_SYNC_REG(X)
>                           print X
> END DO                  END DO
> 
> MPI_WIN_UNLOCK_ALL(win) MPI_WIN_UNLOCK_ALL(win)
> --------------------

Best regards
Rolf

----- Original Message -----
> From: "Rolf Rabenseifner" <rabenseifner at hlrs.de>
> To: "MPI WG Remote Memory Access working group" <mpiwg-rma at lists.mpi-forum.org>
> Cc: "Stefan Andersson" <stefan at cray.com>, "Bill Long" <longb at cray.com>
> Sent: Wednesday, February 5, 2014 5:59:03 PM
> Subject: Re: [mpiwg-rma] [EXTERNAL]        Re:        Synchronization        on        shared        memory        windows
> 
> That was my fault. Here the program that should be examined,
> whether it is correct according to MPI-3.0:
> 
> --------------------
> X is part of a shared memory window and should mean the same
> memory location in both processes produced with
> MPI_WIN_ALLOCATE_SHARED.
> 
> Process A               Process B
> 
> MPI_WIN_LOCK_ALL(       MPI_WIN_LOCK_ALL(
> MPI_MODE_NOCHECK,win)   MPI_MODE_NOCHECK,win)
> 
> DO ...                  DO ...
>   x=...
>   MPI_F_SYNC_REG(X)
>   MPI_WIN_SYNC(win)
>   MPI_Barrier             MPI_Barrier
>                           MPI_WIN_SYNC(win)
>                           MPI_F_SYNC_REG(X)
>                           print X
> END DO                  END DO
> 
> MPI_WIN_UNLOCK_ALL(win) MPI_WIN_UNLOCK_ALL(win)
> --------------------
> 
> Is it now correct according to MPI-3.0?
> And perhaps also according other rules for
> real shared memory programming?
> 
> Would it be helpful to add it to at the end of Sect.11.7
> because it would definitely clarify the rules
> how to use shared memory windows.
> 
> Best regards
> Rolf
> 
> ----- Original Message -----
> > From: "Pavan Balaji" <balaji at anl.gov>
> > To: "MPI WG Remote Memory Access working group"
> > <mpiwg-rma at lists.mpi-forum.org>
> > Cc: "Stefan Andersson" <stefan at cray.com>, "Bill Long"
> > <longb at cray.com>
> > Sent: Wednesday, February 5, 2014 5:46:25 PM
> > Subject: Re: [mpiwg-rma] [EXTERNAL] Re:        Synchronization        on        shared
> >         memory        windows
> > 
> > 
> > Whoops, I read MPI_F_SYNC_REG as MPI_WIN_SYNC.  There need to be
> > WIN_SYNCs on both processes.
> > 
> >   — Pavan
> > 
> > On Feb 5, 2014, at 10:40 AM, Dave Goodell (dgoodell)
> > <dgoodell at cisco.com> wrote:
> > 
> > > Pavan, is it?
> > > 
> > > Rolf, here is the supposed MPI_WIN_SYNC call?  I assume you meant
> > > to put it between the MPI_F_SYNC_REG and MPI_Barrier in both
> > > processes?
> > > 
> > > -Dave
> > > 
> > > On Feb 5, 2014, at 10:31 AM, "Balaji, Pavan" <balaji at anl.gov>
> > > wrote:
> > > 
> > >> 
> > >> Yes, this is a correct program.
> > >> 
> > >> — Pavan
> > >> 
> > >> On Feb 5, 2014, at 10:30 AM, Rolf Rabenseifner
> > >> <rabenseifner at hlrs.de> wrote:
> > >> 
> > >>> Jeff and all,
> > >>> 
> > >>> it looks like that it works as MPI-3 is designed:
> > >>> 
> > >>> I need to add once at begin a
> > >>> MPI_WIN_LOCK_ALL(MPI_MODE_NOCHECK,
> > >>> win)
> > >>> and once at end a MPI_WIN_UNLOCK_ALL(win)
> > >>> and then all works fine with MPI_WIN_SYNC in each iteration.
> > >>> 
> > >>> Is this usage consistent with the definition in the MPI-3
> > >>> standard?
> > >>> 
> > >>> Here the total scenario that I use:
> > >>> 
> > >>> --------------------
> > >>> X is part of a shared memory window and should mean the same
> > >>> memory location in both processes
> > >>> 
> > >>> Process A               Process B
> > >>> 
> > >>> MPI_WIN_LOCK_ALL(       MPI_WIN_LOCK_ALL(
> > >>> MPI_MODE_NOCHECK,win)   MPI_MODE_NOCHECK,win)
> > >>> 
> > >>> DO ...                  DO ...
> > >>> x=...
> > >>> MPI_F_SYNC_REG(X)
> > >>> MPI_Barrier             MPI_Barrier
> > >>>                        MPI_F_SYNC_REG(X)
> > >>>                        print X
> > >>> END DO                  END DO
> > >>> 
> > >>> MPI_WIN_UNLOCK_ALL(win) MPI_WIN_UNLOCK_ALL(win)
> > >>> 
> > >>> --------------------
> > >>> 
> > >>> Best regards
> > >>> Rolf
> > >>> 
> > >>> 
> > >>> 
> > >>> ----- Original Message -----
> > >>>> From: "Jeff Hammond" <jeff.science at gmail.com>
> > >>>> To: "MPI WG Remote Memory Access working group"
> > >>>> <mpiwg-rma at lists.mpi-forum.org>
> > >>>> Cc: "Stefan Andersson" <stefan at cray.com>, "Bill Long"
> > >>>> <longb at cray.com>
> > >>>> Sent: Tuesday, February 4, 2014 7:42:58 PM
> > >>>> Subject: Re: [mpiwg-rma] [EXTERNAL] Re: Synchronization on
> > >>>> shared memory        windows
> > >>>> 
> > >>>> "For the purposes of synchronizing the private and public
> > >>>> window,
> > >>>> MPI_WIN_SYNC has the effect of ending and reopening an access
> > >>>> and
> > >>>> exposure epoch on the window (note that it does not actually
> > >>>> end
> > >>>> an
> > >>>> epoch or complete any pending MPI RMA operations)."
> > >>>> 
> > >>>> I think this is interpreted to mean that this call is only
> > >>>> valid
> > >>>> inside of an existing epoch and thus if you want to call it,
> > >>>> you
> > >>>> need
> > >>>> to use it inside of a passive-target epoch.  Thus, it is not
> > >>>> merely a
> > >>>> portable abstraction for a memory barrier.
> > >>>> 
> > >>>> I think we should fix MPICH and/or MPI-Next to allow the more
> > >>>> general
> > >>>> use such that your code is standard-compliant and executes
> > >>>> correctly.
> > >>>> 
> > >>>> I await violent disagreement from others :-)
> > >>>> 
> > >>>> Jeff
> > >>>> 
> > >>>> On Tue, Feb 4, 2014 at 12:34 PM, Rolf Rabenseifner
> > >>>> <rabenseifner at hlrs.de> wrote:
> > >>>>> Brian, Pavan, and Jeff,
> > >>>>> 
> > >>>>> you convinced me. I did it, see attached file, and my mpich
> > >>>>> based
> > >>>>> Cray lib tells
> > >>>>> 
> > >>>>> Rank 0 [Tue Feb  4 19:31:28 2014] [c9-1c2s7n0] Fatal error in
> > >>>>> MPI_Win_sync: Wrong synchronization of RMA calls , error
> > >>>>> stack:
> > >>>>> MPI_Win_sync(113)...: MPI_Win_sync(win=0xa0000001) failed
> > >>>>> MPIDI_Win_sync(2495): Wrong synchronization of RMA calls
> > >>>>> 
> > >>>>> (only once in each process).
> > >>>>> 
> > >>>>> I expect, that this is now an implementation bug that should
> > >>>>> be
> > >>>>> fixed by mpich and cray?
> > >>>>> 
> > >>>>> Best regards
> > >>>>> Rolf
> > >>>>> 
> > >>>>> ----- Original Message -----
> > >>>>>> From: "Brian W Barrett" <bwbarre at sandia.gov>
> > >>>>>> To: "MPI WG Remote Memory Access working group"
> > >>>>>> <mpiwg-rma at lists.mpi-forum.org>
> > >>>>>> Cc: "Stefan Andersson" <stefan at cray.com>, "Bill Long"
> > >>>>>> <longb at cray.com>
> > >>>>>> Sent: Tuesday, February 4, 2014 7:09:02 PM
> > >>>>>> Subject: Re: [mpiwg-rma] [EXTERNAL] Re: Synchronization on
> > >>>>>> shared
> > >>>>>> memory windows
> > >>>>>> 
> > >>>>>> On 2/4/14 11:01 AM, "Rolf Rabenseifner"
> > >>>>>> <rabenseifner at hlrs.de>
> > >>>>>> wrote:
> > >>>>>> 
> > >>>>>>> The MPI_WIN_SYNC (not the Fortran MPI_F_SYNC_REG)
> > >>>>>>> has no meaning in the unified memory model if all accesses
> > >>>>>>> are done without RMA routines.
> > >>>>>>> It has only a meaning if different public and privat copy
> > >>>>>>> is
> > >>>>>>> there (MPI-3.0 p450:46-p451:2).
> > >>>>>>> MPI-3.0 p456:3 - p457:7 define the rules for the unified
> > >>>>>>> memory
> > >>>>>>> model
> > >>>>>>> but there is no need to use MPI_WIN_SYNC.
> > >>>>>> 
> > >>>>>> Right, there's no need from an MPI point of view, but that
> > >>>>>> doesn't
> > >>>>>> mean
> > >>>>>> that the language/compiler/processor doesn't have a need for
> > >>>>>> extra
> > >>>>>> synchronization.
> > >>>>>> 
> > >>>>>>> The combination of X=13 and MPI_F_SYNC_REG(X)
> > >>>>>>> before MPI_Barrier should guarantee that all bytes of X are
> > >>>>>>> stored in memory. The same should be valid in C,
> > >>>>>>> because the C compiler has no chance to see whether
> > >>>>>>> MPI_Barrier will access the bytes of X or not.
> > >>>>>>> And if it is guaranteed to be in the unified memory,
> > >>>>>>> then the other process (B) should be able to correctly
> > >>>>>>> read the data after the return from its barrier.
> > >>>>>>> 
> > >>>>>>> What is wrong with my thinking?
> > >>>>>>> Which detail do I miss?
> > >>>>>> 
> > >>>>>> According to my reading of the spec, MPI_F_SYNC_REG only
> > >>>>>> prevents
> > >>>>>> the
> > >>>>>> language/compiler from moving the store, but does not say
> > >>>>>> anything
> > >>>>>> about
> > >>>>>> processor ordering.  So the WIN_SYNC in my last e-mail will
> > >>>>>> add
> > >>>>>> the
> > >>>>>> processor memory barrier, which will give you all the
> > >>>>>> semantics
> > >>>>>> you
> > >>>>>> need.
> > >>>>>> 
> > >>>>>> Shared memory programming is a disaster in most languages
> > >>>>>> today,
> > >>>>>> so
> > >>>>>> we
> > >>>>>> decided to pass that disaster on to the user.  We really
> > >>>>>> can't
> > >>>>>> help,
> > >>>>>> without adding lots of overhead (ie, using put/get/rma
> > >>>>>> synchronization).
> > >>>>>> So if a user already knows how to do shared memory
> > >>>>>> programming,
> > >>>>>> this
> > >>>>>> will
> > >>>>>> feel natural.  If they don't, it's going to hurt badly :/.
> > >>>>>> 
> > >>>>>> 
> > >>>>>> Brian
> > >>>>>> 
> > >>>>>> --
> > >>>>>> Brian W. Barrett
> > >>>>>> Scalable System Software Group
> > >>>>>> Sandia National Laboratories
> > >>>>>> 
> > >>>>>> 
> > >>>>>> 
> > >>>>>> 
> > >>>>>> _______________________________________________
> > >>>>>> mpiwg-rma mailing list
> > >>>>>> mpiwg-rma at lists.mpi-forum.org
> > >>>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> > >>>>>> 
> > >>>>> 
> > >>>>> --
> > >>>>> Dr. Rolf Rabenseifner . . . . . . . . . .. email
> > >>>>> rabenseifner at hlrs.de
> > >>>>> High Performance Computing Center (HLRS) . phone
> > >>>>> ++49(0)711/685-65530
> > >>>>> University of Stuttgart . . . . . . . . .. fax ++49(0)711 /
> > >>>>> 685-65832
> > >>>>> Head of Dpmt Parallel Computing . . .
> > >>>>> www.hlrs.de/people/rabenseifner
> > >>>>> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office:
> > >>>>> Room
> > >>>>> 1.307)
> > >>>>> 
> > >>>>> _______________________________________________
> > >>>>> mpiwg-rma mailing list
> > >>>>> mpiwg-rma at lists.mpi-forum.org
> > >>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> > >>>> 
> > >>>> 
> > >>>> 
> > >>>> --
> > >>>> Jeff Hammond
> > >>>> jeff.science at gmail.com
> > >>>> _______________________________________________
> > >>>> mpiwg-rma mailing list
> > >>>> mpiwg-rma at lists.mpi-forum.org
> > >>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> > >>>> 
> > >>> 
> > >>> --
> > >>> Dr. Rolf Rabenseifner . . . . . . . . . .. email
> > >>> rabenseifner at hlrs.de
> > >>> High Performance Computing Center (HLRS) . phone
> > >>> ++49(0)711/685-65530
> > >>> University of Stuttgart . . . . . . . . .. fax ++49(0)711 /
> > >>> 685-65832
> > >>> Head of Dpmt Parallel Computing . . .
> > >>> www.hlrs.de/people/rabenseifner
> > >>> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room
> > >>> 1.307)
> > >>> _______________________________________________
> > >>> mpiwg-rma mailing list
> > >>> mpiwg-rma at lists.mpi-forum.org
> > >>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> > >> 
> > >> _______________________________________________
> > >> mpiwg-rma mailing list
> > >> mpiwg-rma at lists.mpi-forum.org
> > >> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> > > 
> > > _______________________________________________
> > > mpiwg-rma mailing list
> > > mpiwg-rma at lists.mpi-forum.org
> > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> > 
> > _______________________________________________
> > mpiwg-rma mailing list
> > mpiwg-rma at lists.mpi-forum.org
> > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> > 
> 
> --
> Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
> High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
> University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
> Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307)
> _______________________________________________
> mpiwg-rma mailing list
> mpiwg-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma

-- 
Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307)



More information about the mpiwg-rma mailing list