[mpiwg-rma] #429 - Re: Problems with RMA synchronization in combination with load/store shared memory accesses

Rolf Rabenseifner rabenseifner at hlrs.de
Fri May 23 14:21:35 CDT 2014


Dear Jim and Hubert and all RMA-WG together,

These are two different topics:

- What is the content of the assertions
  and do you want to change the existing ones.
  There is need for correction.
  Please be aware, that mpich's post-start-complete-wait
  with Zero assertions currently does not work as
  synchronization of shared memory load and store
  accesses because probably all synchronizations is
  optimized to no-operations if there isn't any RMA call.

  Please feel free to add all what is needed and set you
  as owner, as long as you get all done in MPI-3.1.
  For me, this is errata. 
  For security reasons, final text should be clear at
  the end of this Meeting now.

  Are Hubert's and Jim's proposals identical?
  I expect that I'll be unreachable in the next two weeks
  due to vacation. But I'm not the RMA specialist.

- Is there any difference between a remote and local store
  access to a shared memory window, i.e., can I differentiate
  between the "local" portion and the "remote" portion, with
  the "local" portion defined in the MPI_WIN_ALLOCATE_SHARED.
  (And same for load accesses.)
  My proposed note should say, that this makes no sense.
  The whole shared memory window must be treated as one block.
  That is all what my note says. And as far as I see,
  this is correct.
  Therefore please do not remove this.
  If you want different rules, feel free to do so.
  I'm not the MPI shared memory specialist.

Best regards
Rolf 



----- Original Message -----
> From: "Hubert Ritzdorf" <Hubert.Ritzdorf at EMEA.NEC.COM>
> To: "MPI WG Remote Memory Access working group" <mpiwg-rma at lists.mpi-forum.org>
> Sent: Thursday, May 22, 2014 6:46:28 PM
> Subject: Re: [mpiwg-rma] #429 - Re: Problems with RMA synchronization in combination with load/store shared memory
> accesses
> 
> 
> 
> Hi,
> 
> I think that this fits to my proposal from 11th September 2012 in
> this list,
> when installing shared memory:
> 
> ------ Main part of this mail
> 
> it's quite unclear what Page 410, Lines 17-19
> 
> A consistent view can be created in the uniified
> memory model (see Section 11.4) by utilizing the window
> synchronization functions (see
> Section 11.5)
> 
> really means. Section 11.5 doesn't mention any (load/store) access to
> shared memory.
> Thus, must
> 
> (*) RMA communication calls and RMA operations
>      be interpreted   as RMA communication calls (MPI_GET, MPI_PUT,
> ...) and
>                                 ANY load/store access to shared
> window
> (*) put call             as put call and any store to shared memory
> (*) get call             as get call and any load from shared memory
> (*) accumulate call as accumulate call and any load or store access
> to shared window ?
> 
> Example: Assertion MPI_MODE_NOPRECEDE
> 
> Does
> 
> the fence does not complete any sequence of locally issued RMA calls
> 
> mean for windows created by MPI_Win_Allocate_shared ()
> 
> the fence does not complete any sequence of locally issued RMA calls
> or
> any load/store access to the window memory ?
> 
> --- end of this mail
> 
> Torsten answered:
> 
> This is what I was referring to. I'm in favor of this proposal.
> 
> Hubert
> 
> 
> 
> From: mpiwg-rma [mpiwg-rma-bounces at lists.mpi-forum.org] on behalf of
> Jim Dinan [james.dinan at gmail.com]
> Sent: Thursday, May 22, 2014 2:43 PM
> To: MPI WG Remote Memory Access working group
> Subject: Re: [mpiwg-rma] #429 - Re: Problems with RMA synchronization
> in combination with load/store shared memory accesses
> 
> 
> 
> 
> Hi Rolf,
> 
> 
> You suggested adding the following text to the description of active
> target assertions:
> 
> 
> 
> 
> 
> 
> Note that in shared memory windows (allocated with
> MPI_WIN_ALLOCATE_SHARED), there is no difference between remote
> store accesses and local store accesses to the window, and the same
> for remote and local loads.
> I'm not that comfortable with this text.  We definitely do have to
> handle local and remote stores differently within the MPI
> implementation in order to ensure that they become visible according
> to the active target synchronization model.  I think we may need to
> update the NOPRECEDE, NOSUCCEED, and NOPUT assertions to apply to
> both RMA updates and updates to shared memory windows.  For example,
> we could update the text as follows:
> 
> MPI_MODE_NOPRECEDE - the fence does not complete any sequence of
> locally issued RMA calls or shared memory window updates . If this
> assertion is given by any process in the window group, then it must
> be given by all processes in the group.
>  ~Jim.
> 
> 
> 
> On Mon, May 19, 2014 at 1:59 PM, Rolf Rabenseifner <
> rabenseifner at hlrs.de > wrote:
> 
> 
> Jim,
> 
> I mnade #429.
> It is reported by me, but not owned, i.e., the RMA WG should take it.
> If the proposed text is okay for you (or with minor updates)
> then you can vote on it.
> 
> I'll put it on the errata reading+voting list.
> 
> It is the minimal text you can do that users and implementers
> see, that shared Memory Windows are Special for assertions.
> Therefore also the change-log, because implementors
> had and may also in future oversee this Problem.
> 
> Best regards
> Rolf
> 
> ----- Original Message -----
> > From: "Rolf Rabenseifner" < rabenseifner at hlrs.de >
> > To: "MPI WG Remote Memory Access working group" <
> > mpiwg-rma at lists.mpi-forum.org >
> > Sent: Monday, May 19, 2014 7:16:44 PM
> > Subject: Re: [mpiwg-rma] Problems with RMA synchronization in
> > combination with load/store shared memory accesses
> > 
> > Jim and RMA WG,
> > 
> > There are now two questions:
> > 
> > Jim asked:
> > > Question to WG: Do we need to update the fence assertions to
> > > better
> > > define interaction with local load/store accesses and remote
> > > stores?
> > > 
> > 
> > Rolf asked:
> > > Additionally, I would recommend that we add after MPI-3.0 p451:33
> > > 
> > >   Note that in shared memory windows (allocated with
> > >   MPI_WIN_ALLOCATE_SHARED), there is no difference
> > >   between remote store accesses and local store accesses
> > >   to the window.
> > > 
> > > This would help to understand that "the local window
> > > was not updated by stores" does not mean "by local stores",
> > > see p452:1 and p452:9.
> > 
> > For me, it is important to understand the meaning of the
> > current assertions if they are used in a shared memory window.
> > Therefore my proposal above as erratum to MPI-3.0.
> > 
> > In MPI-3.1 and 4.0, you may want to add additional assertions.
> > 
> > Your analysis below, will also show that mpich implements
> > Post-Start-Complete-Wait synchronization in a wrong way,
> > if there are no calls to RMA routines.
> > 
> > Best regards
> > Rolf
> > 
> > ----- Original Message -----
> > > From: "Jim Dinan" < james.dinan at gmail.com >
> > > To: "MPI WG Remote Memory Access working group"
> > > < mpiwg-rma at lists.mpi-forum.org >
> > > Sent: Thursday, May 15, 2014 4:06:08 PM
> > > Subject: Re: [mpiwg-rma] Problems with RMA synchronization in
> > > combination with load/store shared memory accesses
> > > 
> > > 
> > > 
> > > Rolf,
> > > 
> > > 
> > > Here is an attempt to simplify your example for discussion.
> > >  Given
> > > a
> > > shared memory window, shr_mem_win, with buffer, shr_mem_buf:
> > > 
> > > MPI_Win_fence(MPI_MODE_NOSTORE | MPI_MODE_NOPUT |
> > > MPI_MODE_NOPRECEDE
> > > | MPI_MODE_NOSUCCEED, shr_mem_win );
> > > 
> > > shr_mem_buf[...] = ...;
> > > 
> > > MPI_Win_fence(MPI_MODE_NOPUT | MPI_MODE_NOPRECEDE |
> > > MPI_MODE_NOSUCCEED, shr_mem_win);
> > > 
> > > 
> > > Right now, Fence assertions don't say anything special about
> > > shared
> > > memory windows:
> > > 
> > > 
> > > Inline image 1
> > > 
> > > 
> > > NOPRECEDE/SUCCEED are defined in terms of MPI RMA function calls,
> > > and
> > > do not cover load/store.  Thus, Rolf's usage appears to be
> > > correct
> > > per the current text.  In the MPICH fence implementation,
> > > src/mpid/ch3/src/ch3u_rma_sync.c:935 we have:
> > > 
> > > if (!(assert & MPI_MODE_NOSUCCEED)) win_ptr->fence_issued = 1;
> > > 
> > > Because of this check, we don't actually start an active target
> > > epoch
> > > on the first fence in the example above.  On the second fence, we
> > > therefore don't perform the necessary synchronization, leading to
> > > incorrect output in Rolf's example.
> > > 
> > > 
> > > Question to WG: Do we need to update the fence assertions to
> > > better
> > > define interaction with local load/store accesses and remote
> > > stores?
> > > 
> > > 
> > > If not, then Rolf's code is correct and we need to modify the
> > > check
> > > above in MPICH to something like:
> > > 
> > > 
> > > if (!(assert & MPI_MODE_NOSUCCEED) || win_ptr->create_ flavor  ==
> > > MPI_WIN_FLAVOR_SHARED )
> > >   win_ptr->fence_issued = 1;
> > > 
> > > 
> > >  ~Jim.
> > > 
> > > 
> > > 
> > > 
> > > On Tue, Apr 8, 2014 at 12:02 PM, Rolf Rabenseifner <
> > > rabenseifner at hlrs.de > wrote:
> > > 
> > > 
> > > Jim,
> > > 
> > > I'm now sure, that mpich has a bug with assertions on shared
> > > memory
> > > windows.
> > > 
> > > in the example, rcv_buf_left and rcv_buf_right are the windows.
> > > the only accesses to these rcv_buf_... are stores from remote
> > > and fully local loads.
> > > Both accesses are done in different epochs surrounded by
> > > MPI_Win_fence.
> > > 
> > > According to your interpretation (which is really okay)
> > > all fences can use all possible assertions (!!!),
> > > except after the remote stores, MPI_MODE_NOSTORE cannot be used.
> > > 
> > > I updated the example and mpich is executing it wrong.
> > > 
> > > Please check it yourself on your installation:
> > > halo_1sided_store_win_alloc_shared_w-a-2-cray.c
> > > 
> > > Without the assertions, all works:
> > > halo_1sided_store_win_alloc_shared_w-a-2NO-cray.c
> > > 
> > > Could you verify that mpich has a bug?
> > > 
> > > Additionally, I would recommend that we add after MPI-3.0 p451:33
> > > 
> > >   Note that in shared memory windows (allocated with
> > >   MPI_WIN_ALLOCATE_SHARED), there is no difference
> > >   between remote store accesses and local store accesses
> > >   to the window.
> > > 
> > > This would help to understand that "the local window
> > > was not updated by stores" does not mean "by local stores",
> > > see p452:1 and p452:9.
> > > 
> > > Is it a good idea?
> > > 
> > > Best regards
> > > Rolf
> > > 
> > > 
> > > 
> > > ----- Original Message -----
> > > > From: "Jim Dinan" < james.dinan at gmail.com >
> > > > To: "MPI WG Remote Memory Access working group" <
> > > > mpiwg-rma at lists.mpi-forum.org >
> > > > Sent: Friday, March 21, 2014 8:14:22 PM
> > > > Subject: Re: [mpiwg-rma] Problems with RMA synchronization in
> > > > combination with load/store shared memory accesses
> > > > 
> > > > 
> > > > 
> > > 
> > > > Rolf,
> > > > 
> > > > 
> > > > This line is incorrect: MPI_Win_fence(MPI_MODE_NOSTORE +
> > > > MPI_MODE_NOPRECEDE, win_ rcv_buf _left );
> > > 
> > > 
> > > > 
> > > > 
> > > > You need to do a bitwise OR of the assertions (MPI_MODE_NOSTORE
> > > > |
> > > > MPI_MODE_NOPRECEDE).
> > > > 
> > > > In halo_1sided_store_win_alloc_shared.c, you are doing stores
> > > > within
> > > > the epoch, so MPI_MODE_NOSTORE looks like an incorrect
> > > > assertion
> > > > on
> > > > the closing fence.
> > > > 
> > > > Following the Fence epoch, you are reading from the left/right
> > > > recv
> > > > buffers.  That also needs to be done within an RMA epoch, if
> > > > you
> > > > are
> > > > reading non-local data.
> > > > 
> > > >  ~Jim.
> > > > 
> > > > 
> > > > 
> > > > On Fri, Feb 21, 2014 at 6:07 AM, Rolf Rabenseifner <
> > > > rabenseifner at hlrs.de > wrote:
> > > > 
> > > > 
> > > > Dear member of the RMA group and especially the mpich
> > > > developers,
> > > > 
> > > > I have real problems with the new shared memory in MPI-3.0,
> > > > i.e., the load/stores together with the RMA synchronization
> > > > causes wrong execution results.
> > > > 
> > > > The attached
> > > >     1sided_halo_C_mpich_problems_rabenseifner.tar.gz or .zip
> > > > contains
> > > > 
> > > > - 1sided/halo_1sided_put_win_alloc.c
> > > > 
> > > >   The basis that works. It uses MPI_Put and MPI_Win_fence for
> > > >   duplex left/right halo communication.
> > > > 
> > > > - 1sided/halo_1sided_store_win_alloc_shared.c
> > > > 
> > > >    This is the same, but a shared memory window is used and
> > > >    the MPU_Put is substituted by storing the data in the
> > > >    neighbors window. Same MPI_Win_fence with same assertions.
> > > > 
> > > >    This does not work, although I'm sure that my assertions are
> > > > correct.
> > > > 
> > > >    Known possibilities:
> > > >    - I'm wrong and was not able to understand the assertions
> > > >      on MPI-3.0 p452:8-19.
> > > >    - I'm wrong because it is invalid to use the MPI_Win_fence
> > > >      together with the shared memory windows.
> > > >    - mpich has a bug.
> > > >    (The first two possibilities are the reason, why I use this
> > > >     Forum email list)
> > > > 
> > > > - 1sided/halo_1sided_store_win_alloc_shared_w-a-cray.c
> > > > 
> > > >    This is a work-around-for Cray that works on our Cray
> > > >    and does not use MPI_MODE_NOPRECEDE and MPI_MODE_NOSUCCEED.
> > > >    It also runs on another mpich installation.
> > > > 
> > > > - 1sided/halo_1sided_store_win_alloc_shared_pscw.c
> > > > 
> > > >    Here, MPI_Win_fence is substituted by
> > > > Post-Start-Complete-Wait
> > > >    and it does not work for any assertions.
> > > > 
> > > >    Same possibilities as above.
> > > > 
> > > > - 1sided/halo_1sided_store_win_alloc_shared_query.c
> > > > - 1sided/halo_1sided_store_win_alloc_shared_query_w-a-cray.c
> > > > 
> > > >    Same as halo_1sided_store_win_alloc_shared.c
> > > >    but non-contigues windows are used.
> > > >    Same problems as above.
> > > > 
> > > > - 1sided/halo_1sided_store_win_alloc_shared_othersync.c
> > > > 
> > > >    This version uses the synchronization according to
> > > >    #413 and it is tested and works on two platforms.
> > > > 
> > > > Best regards
> > > > Rolf
> > > > 
> > > > --
> > > > Dr. Rolf Rabenseifner . . . . . . . . . .. email
> > > > rabenseifner at hlrs.de
> > > > High Performance Computing Center (HLRS) . phone
> > > > ++49(0)711/685-65530
> > > > University of Stuttgart . . . . . . . . .. fax ++49(0)711 /
> > > > 685-65832
> > > > Head of Dpmt Parallel Computing . . .
> > > > www.hlrs.de/people/rabenseifner
> > > > Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room
> > > > 1.307)
> > > > 
> > > > _______________________________________________
> > > > mpiwg-rma mailing list
> > > > mpiwg-rma at lists.mpi-forum.org
> > > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> > > > 
> > > > 
> > > > _______________________________________________
> > > > mpiwg-rma mailing list
> > > > mpiwg-rma at lists.mpi-forum.org
> > > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> > > 
> > > --
> > > Dr. Rolf Rabenseifner . . . . . . . . . .. email
> > > rabenseifner at hlrs.de
> > > High Performance Computing Center (HLRS) . phone
> > > ++49(0)711/685-65530
> > > University of Stuttgart . . . . . . . . .. fax ++49(0)711 /
> > > 685-65832
> > > Head of Dpmt Parallel Computing . . .
> > > www.hlrs.de/people/rabenseifner
> > > Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room
> > > 1.307)
> > > 
> > > _______________________________________________
> > > mpiwg-rma mailing list
> > > mpiwg-rma at lists.mpi-forum.org
> > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> > > 
> > > 
> > > _______________________________________________
> > > mpiwg-rma mailing list
> > > mpiwg-rma at lists.mpi-forum.org
> > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> > 
> > --
> > Dr. Rolf Rabenseifner . . . . . . . . . .. email
> > rabenseifner at hlrs.de
> > High Performance Computing Center (HLRS) . phone
> > ++49(0)711/685-65530
> > University of Stuttgart . . . . . . . . .. fax ++49(0)711 /
> > 685-65832
> > Head of Dpmt Parallel Computing . . .
> > www.hlrs.de/people/rabenseifner
> > Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room
> > 1.307)
> > 
> 
> --
> Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
> High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
> University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
> Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307)
> _______________________________________________
> mpiwg-rma mailing list
> mpiwg-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> 
> 
> 
> 
> Click here to report this email as spam.
> _______________________________________________
> mpiwg-rma mailing list
> mpiwg-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma

-- 
Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307)



More information about the mpiwg-rma mailing list