[mpiwg-rma] [EXTERNAL] Re: Synchronization on shared memory windows

Rolf Rabenseifner rabenseifner at hlrs.de
Wed Feb 5 10:59:03 CST 2014


That was my fault. Here the program that should be examined,
whether it is correct according to MPI-3.0:

--------------------
X is part of a shared memory window and should mean the same
memory location in both processes produced with
MPI_WIN_ALLOCATE_SHARED.

Process A               Process B

MPI_WIN_LOCK_ALL(       MPI_WIN_LOCK_ALL(
MPI_MODE_NOCHECK,win)   MPI_MODE_NOCHECK,win) 

DO ...                  DO ...
  x=...
  MPI_F_SYNC_REG(X)
  MPI_WIN_SYNC(win)    
  MPI_Barrier             MPI_Barrier
                          MPI_WIN_SYNC(win)
                          MPI_F_SYNC_REG(X)
                          print X
END DO                  END DO

MPI_WIN_UNLOCK_ALL(win) MPI_WIN_UNLOCK_ALL(win)
--------------------

Is it now correct according to MPI-3.0?
And perhaps also according other rules for 
real shared memory programming? 

Would it be helpful to add it to at the end of Sect.11.7
because it would definitely clarify the rules
how to use shared memory windows.

Best regards
Rolf

----- Original Message -----
> From: "Pavan Balaji" <balaji at anl.gov>
> To: "MPI WG Remote Memory Access working group" <mpiwg-rma at lists.mpi-forum.org>
> Cc: "Stefan Andersson" <stefan at cray.com>, "Bill Long" <longb at cray.com>
> Sent: Wednesday, February 5, 2014 5:46:25 PM
> Subject: Re: [mpiwg-rma] [EXTERNAL] Re:	Synchronization	on	shared	memory	windows
> 
> 
> Whoops, I read MPI_F_SYNC_REG as MPI_WIN_SYNC.  There need to be
> WIN_SYNCs on both processes.
> 
>   — Pavan
> 
> On Feb 5, 2014, at 10:40 AM, Dave Goodell (dgoodell)
> <dgoodell at cisco.com> wrote:
> 
> > Pavan, is it?
> > 
> > Rolf, here is the supposed MPI_WIN_SYNC call?  I assume you meant
> > to put it between the MPI_F_SYNC_REG and MPI_Barrier in both
> > processes?
> > 
> > -Dave
> > 
> > On Feb 5, 2014, at 10:31 AM, "Balaji, Pavan" <balaji at anl.gov>
> > wrote:
> > 
> >> 
> >> Yes, this is a correct program.
> >> 
> >> — Pavan
> >> 
> >> On Feb 5, 2014, at 10:30 AM, Rolf Rabenseifner
> >> <rabenseifner at hlrs.de> wrote:
> >> 
> >>> Jeff and all,
> >>> 
> >>> it looks like that it works as MPI-3 is designed:
> >>> 
> >>> I need to add once at begin a MPI_WIN_LOCK_ALL(MPI_MODE_NOCHECK,
> >>> win)
> >>> and once at end a MPI_WIN_UNLOCK_ALL(win)
> >>> and then all works fine with MPI_WIN_SYNC in each iteration.
> >>> 
> >>> Is this usage consistent with the definition in the MPI-3
> >>> standard?
> >>> 
> >>> Here the total scenario that I use:
> >>> 
> >>> --------------------
> >>> X is part of a shared memory window and should mean the same
> >>> memory location in both processes
> >>> 
> >>> Process A               Process B
> >>> 
> >>> MPI_WIN_LOCK_ALL(       MPI_WIN_LOCK_ALL(
> >>> MPI_MODE_NOCHECK,win)   MPI_MODE_NOCHECK,win)
> >>> 
> >>> DO ...                  DO ...
> >>> x=...
> >>> MPI_F_SYNC_REG(X)
> >>> MPI_Barrier             MPI_Barrier
> >>>                        MPI_F_SYNC_REG(X)
> >>>                        print X
> >>> END DO                  END DO
> >>> 
> >>> MPI_WIN_UNLOCK_ALL(win) MPI_WIN_UNLOCK_ALL(win)
> >>> 
> >>> --------------------
> >>> 
> >>> Best regards
> >>> Rolf
> >>> 
> >>> 
> >>> 
> >>> ----- Original Message -----
> >>>> From: "Jeff Hammond" <jeff.science at gmail.com>
> >>>> To: "MPI WG Remote Memory Access working group"
> >>>> <mpiwg-rma at lists.mpi-forum.org>
> >>>> Cc: "Stefan Andersson" <stefan at cray.com>, "Bill Long"
> >>>> <longb at cray.com>
> >>>> Sent: Tuesday, February 4, 2014 7:42:58 PM
> >>>> Subject: Re: [mpiwg-rma] [EXTERNAL] Re: Synchronization on
> >>>> shared memory	windows
> >>>> 
> >>>> "For the purposes of synchronizing the private and public
> >>>> window,
> >>>> MPI_WIN_SYNC has the effect of ending and reopening an access
> >>>> and
> >>>> exposure epoch on the window (note that it does not actually end
> >>>> an
> >>>> epoch or complete any pending MPI RMA operations)."
> >>>> 
> >>>> I think this is interpreted to mean that this call is only valid
> >>>> inside of an existing epoch and thus if you want to call it, you
> >>>> need
> >>>> to use it inside of a passive-target epoch.  Thus, it is not
> >>>> merely a
> >>>> portable abstraction for a memory barrier.
> >>>> 
> >>>> I think we should fix MPICH and/or MPI-Next to allow the more
> >>>> general
> >>>> use such that your code is standard-compliant and executes
> >>>> correctly.
> >>>> 
> >>>> I await violent disagreement from others :-)
> >>>> 
> >>>> Jeff
> >>>> 
> >>>> On Tue, Feb 4, 2014 at 12:34 PM, Rolf Rabenseifner
> >>>> <rabenseifner at hlrs.de> wrote:
> >>>>> Brian, Pavan, and Jeff,
> >>>>> 
> >>>>> you convinced me. I did it, see attached file, and my mpich
> >>>>> based
> >>>>> Cray lib tells
> >>>>> 
> >>>>> Rank 0 [Tue Feb  4 19:31:28 2014] [c9-1c2s7n0] Fatal error in
> >>>>> MPI_Win_sync: Wrong synchronization of RMA calls , error stack:
> >>>>> MPI_Win_sync(113)...: MPI_Win_sync(win=0xa0000001) failed
> >>>>> MPIDI_Win_sync(2495): Wrong synchronization of RMA calls
> >>>>> 
> >>>>> (only once in each process).
> >>>>> 
> >>>>> I expect, that this is now an implementation bug that should be
> >>>>> fixed by mpich and cray?
> >>>>> 
> >>>>> Best regards
> >>>>> Rolf
> >>>>> 
> >>>>> ----- Original Message -----
> >>>>>> From: "Brian W Barrett" <bwbarre at sandia.gov>
> >>>>>> To: "MPI WG Remote Memory Access working group"
> >>>>>> <mpiwg-rma at lists.mpi-forum.org>
> >>>>>> Cc: "Stefan Andersson" <stefan at cray.com>, "Bill Long"
> >>>>>> <longb at cray.com>
> >>>>>> Sent: Tuesday, February 4, 2014 7:09:02 PM
> >>>>>> Subject: Re: [mpiwg-rma] [EXTERNAL] Re: Synchronization on
> >>>>>> shared
> >>>>>> memory windows
> >>>>>> 
> >>>>>> On 2/4/14 11:01 AM, "Rolf Rabenseifner" <rabenseifner at hlrs.de>
> >>>>>> wrote:
> >>>>>> 
> >>>>>>> The MPI_WIN_SYNC (not the Fortran MPI_F_SYNC_REG)
> >>>>>>> has no meaning in the unified memory model if all accesses
> >>>>>>> are done without RMA routines.
> >>>>>>> It has only a meaning if different public and privat copy is
> >>>>>>> there (MPI-3.0 p450:46-p451:2).
> >>>>>>> MPI-3.0 p456:3 - p457:7 define the rules for the unified
> >>>>>>> memory
> >>>>>>> model
> >>>>>>> but there is no need to use MPI_WIN_SYNC.
> >>>>>> 
> >>>>>> Right, there's no need from an MPI point of view, but that
> >>>>>> doesn't
> >>>>>> mean
> >>>>>> that the language/compiler/processor doesn't have a need for
> >>>>>> extra
> >>>>>> synchronization.
> >>>>>> 
> >>>>>>> The combination of X=13 and MPI_F_SYNC_REG(X)
> >>>>>>> before MPI_Barrier should guarantee that all bytes of X are
> >>>>>>> stored in memory. The same should be valid in C,
> >>>>>>> because the C compiler has no chance to see whether
> >>>>>>> MPI_Barrier will access the bytes of X or not.
> >>>>>>> And if it is guaranteed to be in the unified memory,
> >>>>>>> then the other process (B) should be able to correctly
> >>>>>>> read the data after the return from its barrier.
> >>>>>>> 
> >>>>>>> What is wrong with my thinking?
> >>>>>>> Which detail do I miss?
> >>>>>> 
> >>>>>> According to my reading of the spec, MPI_F_SYNC_REG only
> >>>>>> prevents
> >>>>>> the
> >>>>>> language/compiler from moving the store, but does not say
> >>>>>> anything
> >>>>>> about
> >>>>>> processor ordering.  So the WIN_SYNC in my last e-mail will
> >>>>>> add
> >>>>>> the
> >>>>>> processor memory barrier, which will give you all the
> >>>>>> semantics
> >>>>>> you
> >>>>>> need.
> >>>>>> 
> >>>>>> Shared memory programming is a disaster in most languages
> >>>>>> today,
> >>>>>> so
> >>>>>> we
> >>>>>> decided to pass that disaster on to the user.  We really can't
> >>>>>> help,
> >>>>>> without adding lots of overhead (ie, using put/get/rma
> >>>>>> synchronization).
> >>>>>> So if a user already knows how to do shared memory
> >>>>>> programming,
> >>>>>> this
> >>>>>> will
> >>>>>> feel natural.  If they don't, it's going to hurt badly :/.
> >>>>>> 
> >>>>>> 
> >>>>>> Brian
> >>>>>> 
> >>>>>> --
> >>>>>> Brian W. Barrett
> >>>>>> Scalable System Software Group
> >>>>>> Sandia National Laboratories
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>> _______________________________________________
> >>>>>> mpiwg-rma mailing list
> >>>>>> mpiwg-rma at lists.mpi-forum.org
> >>>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> >>>>>> 
> >>>>> 
> >>>>> --
> >>>>> Dr. Rolf Rabenseifner . . . . . . . . . .. email
> >>>>> rabenseifner at hlrs.de
> >>>>> High Performance Computing Center (HLRS) . phone
> >>>>> ++49(0)711/685-65530
> >>>>> University of Stuttgart . . . . . . . . .. fax ++49(0)711 /
> >>>>> 685-65832
> >>>>> Head of Dpmt Parallel Computing . . .
> >>>>> www.hlrs.de/people/rabenseifner
> >>>>> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room
> >>>>> 1.307)
> >>>>> 
> >>>>> _______________________________________________
> >>>>> mpiwg-rma mailing list
> >>>>> mpiwg-rma at lists.mpi-forum.org
> >>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> >>>> 
> >>>> 
> >>>> 
> >>>> --
> >>>> Jeff Hammond
> >>>> jeff.science at gmail.com
> >>>> _______________________________________________
> >>>> mpiwg-rma mailing list
> >>>> mpiwg-rma at lists.mpi-forum.org
> >>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> >>>> 
> >>> 
> >>> --
> >>> Dr. Rolf Rabenseifner . . . . . . . . . .. email
> >>> rabenseifner at hlrs.de
> >>> High Performance Computing Center (HLRS) . phone
> >>> ++49(0)711/685-65530
> >>> University of Stuttgart . . . . . . . . .. fax ++49(0)711 /
> >>> 685-65832
> >>> Head of Dpmt Parallel Computing . . .
> >>> www.hlrs.de/people/rabenseifner
> >>> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room
> >>> 1.307)
> >>> _______________________________________________
> >>> mpiwg-rma mailing list
> >>> mpiwg-rma at lists.mpi-forum.org
> >>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> >> 
> >> _______________________________________________
> >> mpiwg-rma mailing list
> >> mpiwg-rma at lists.mpi-forum.org
> >> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> > 
> > _______________________________________________
> > mpiwg-rma mailing list
> > mpiwg-rma at lists.mpi-forum.org
> > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> 
> _______________________________________________
> mpiwg-rma mailing list
> mpiwg-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> 

-- 
Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307)



More information about the mpiwg-rma mailing list