[mpiwg-rma] Problems with RMA synchronization in combination with load/store shared memory accesses

Jim Dinan james.dinan at gmail.com
Sun Jun 1 13:51:38 CDT 2014


I tend to agree with Jeff.  On some architectures different operations are
requires to make my operations visible to others versus making operations
performed by others visible to me.

Is this meeting the last call for errata, or is it the September meeting?

 ~Jim.


On Sat, May 31, 2014 at 4:44 PM, Jeff Hammond <jeff.science at gmail.com>
wrote:

> Remote load-store cannot be treated like local load-store from a
> sequential consistency perspective.  If a process does local
> load-store, it is likely that no memory barrier will be required to
> see a consistent view of memory.  When another process does
> load-store, this changes dramatically.
>
> Jeff
>
> On Sat, May 31, 2014 at 3:31 PM, Rajeev Thakur <thakur at mcs.anl.gov> wrote:
> > I think before ticket 429 (
> https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/429) is put up for a
> vote as errata, the RMA working group needs to decide whether remote
> loads/stores to shared memory windows are treated as local loads and stores
> or as put/get operations (for the purpose of the assert definitions). The
> text will be different depending on that.
> >
> > If remote loads/stores to shared memory windows are considered as local
> loads/stores they will be covered under MPI_MODE_NOSTORE; if considered as
> put/get operations, they will be covered under MPI_MODE_NOPRECEDE,
> MPI_MODE_NOSUCCEED, and MPI_MODE_NOPUT.
> >
> > Ticket 429 says they should be considered as local loads/stores.
> >
> > Rajeev
> >
> >
> > On May 27, 2014, at 1:25 PM, Jim Dinan <james.dinan at gmail.com> wrote:
> >
> >> Hi Rolf,
> >>
> >> MPI_MODE_NOSTORE applies to local updates that should be made visible
> to other processes following the end of the access epoch.  I believe that
> visibility of updates made by other processes were intended to be
> incorporated into the NOPRECEDE/NOSUCCEED assertions.  I think that
> Hubert's proposal may be the right approach -- that remote load/store
> accesses to the shared memory window should be treated as "RMA" (e.g.
> analogous to get/put) operations.
> >>
> >>  ~Jim.
> >>
> >>
> >> On Mon, May 19, 2014 at 1:16 PM, Rolf Rabenseifner <
> rabenseifner at hlrs.de> wrote:
> >> Jim and RMA WG,
> >>
> >> There are now two questions:
> >>
> >> Jim asked:
> >> > Question to WG: Do we need to update the fence assertions to better
> >> > define interaction with local load/store accesses and remote stores?
> >> >
> >>
> >> Rolf asked:
> >> > Additionally, I would recommend that we add after MPI-3.0 p451:33
> >> >
> >> >   Note that in shared memory windows (allocated with
> >> >   MPI_WIN_ALLOCATE_SHARED), there is no difference
> >> >   between remote store accesses and local store accesses
> >> >   to the window.
> >> >
> >> > This would help to understand that "the local window
> >> > was not updated by stores" does not mean "by local stores",
> >> > see p452:1 and p452:9.
> >>
> >> For me, it is important to understand the meaning of the
> >> current assertions if they are used in a shared memory window.
> >> Therefore my proposal above as erratum to MPI-3.0.
> >>
> >> In MPI-3.1 and 4.0, you may want to add additional assertions.
> >>
> >> Your analysis below, will also show that mpich implements
> >> Post-Start-Complete-Wait synchronization in a wrong way,
> >> if there are no calls to RMA routines.
> >>
> >> Best regards
> >> Rolf
> >>
> >> ----- Original Message -----
> >> > From: "Jim Dinan" <james.dinan at gmail.com>
> >> > To: "MPI WG Remote Memory Access working group" <
> mpiwg-rma at lists.mpi-forum.org>
> >> > Sent: Thursday, May 15, 2014 4:06:08 PM
> >> > Subject: Re: [mpiwg-rma] Problems with RMA synchronization in
> combination with load/store shared memory accesses
> >> >
> >> >
> >> >
> >> > Rolf,
> >> >
> >> >
> >> > Here is an attempt to simplify your example for discussion.  Given a
> >> > shared memory window, shr_mem_win, with buffer, shr_mem_buf:
> >> >
> >> > MPI_Win_fence(MPI_MODE_NOSTORE | MPI_MODE_NOPUT | MPI_MODE_NOPRECEDE
> >> > | MPI_MODE_NOSUCCEED, shr_mem_win );
> >> >
> >> > shr_mem_buf[...] = ...;
> >> >
> >> > MPI_Win_fence(MPI_MODE_NOPUT | MPI_MODE_NOPRECEDE |
> >> > MPI_MODE_NOSUCCEED, shr_mem_win);
> >> >
> >> >
> >> > Right now, Fence assertions don't say anything special about shared
> >> > memory windows:
> >> >
> >> >
> >> > Inline image 1
> >> >
> >> >
> >> > NOPRECEDE/SUCCEED are defined in terms of MPI RMA function calls, and
> >> > do not cover load/store.  Thus, Rolf's usage appears to be correct
> >> > per the current text.  In the MPICH fence implementation,
> >> > src/mpid/ch3/src/ch3u_rma_sync.c:935 we have:
> >> >
> >> > if (!(assert & MPI_MODE_NOSUCCEED)) win_ptr->fence_issued = 1;
> >> >
> >> > Because of this check, we don't actually start an active target epoch
> >> > on the first fence in the example above.  On the second fence, we
> >> > therefore don't perform the necessary synchronization, leading to
> >> > incorrect output in Rolf's example.
> >> >
> >> >
> >> > Question to WG: Do we need to update the fence assertions to better
> >> > define interaction with local load/store accesses and remote stores?
> >> >
> >> >
> >> > If not, then Rolf's code is correct and we need to modify the check
> >> > above in MPICH to something like:
> >> >
> >> >
> >> > if (!(assert & MPI_MODE_NOSUCCEED) || win_ptr->create_ flavor  ==
> >> > MPI_WIN_FLAVOR_SHARED )
> >> >   win_ptr->fence_issued = 1;
> >> >
> >> >
> >> >  ~Jim.
> >> >
> >> >
> >> >
> >> >
> >> > On Tue, Apr 8, 2014 at 12:02 PM, Rolf Rabenseifner <
> >> > rabenseifner at hlrs.de > wrote:
> >> >
> >> >
> >> > Jim,
> >> >
> >> > I'm now sure, that mpich has a bug with assertions on shared memory
> >> > windows.
> >> >
> >> > in the example, rcv_buf_left and rcv_buf_right are the windows.
> >> > the only accesses to these rcv_buf_... are stores from remote
> >> > and fully local loads.
> >> > Both accesses are done in different epochs surrounded by
> >> > MPI_Win_fence.
> >> >
> >> > According to your interpretation (which is really okay)
> >> > all fences can use all possible assertions (!!!),
> >> > except after the remote stores, MPI_MODE_NOSTORE cannot be used.
> >> >
> >> > I updated the example and mpich is executing it wrong.
> >> >
> >> > Please check it yourself on your installation:
> >> > halo_1sided_store_win_alloc_shared_w-a-2-cray.c
> >> >
> >> > Without the assertions, all works:
> >> > halo_1sided_store_win_alloc_shared_w-a-2NO-cray.c
> >> >
> >> > Could you verify that mpich has a bug?
> >> >
> >> > Additionally, I would recommend that we add after MPI-3.0 p451:33
> >> >
> >> >   Note that in shared memory windows (allocated with
> >> >   MPI_WIN_ALLOCATE_SHARED), there is no difference
> >> >   between remote store accesses and local store accesses
> >> >   to the window.
> >> >
> >> > This would help to understand that "the local window
> >> > was not updated by stores" does not mean "by local stores",
> >> > see p452:1 and p452:9.
> >> >
> >> > Is it a good idea?
> >> >
> >> > Best regards
> >> > Rolf
> >> >
> >> >
> >> >
> >> > ----- Original Message -----
> >> > > From: "Jim Dinan" < james.dinan at gmail.com >
> >> > > To: "MPI WG Remote Memory Access working group" <
> >> > > mpiwg-rma at lists.mpi-forum.org >
> >> > > Sent: Friday, March 21, 2014 8:14:22 PM
> >> > > Subject: Re: [mpiwg-rma] Problems with RMA synchronization in
> >> > > combination with load/store shared memory accesses
> >> > >
> >> > >
> >> > >
> >> >
> >> > > Rolf,
> >> > >
> >> > >
> >> > > This line is incorrect: MPI_Win_fence(MPI_MODE_NOSTORE +
> >> > > MPI_MODE_NOPRECEDE, win_ rcv_buf _left );
> >> >
> >> >
> >> > >
> >> > >
> >> > > You need to do a bitwise OR of the assertions (MPI_MODE_NOSTORE |
> >> > > MPI_MODE_NOPRECEDE).
> >> > >
> >> > > In halo_1sided_store_win_alloc_shared.c, you are doing stores
> >> > > within
> >> > > the epoch, so MPI_MODE_NOSTORE looks like an incorrect assertion on
> >> > > the closing fence.
> >> > >
> >> > > Following the Fence epoch, you are reading from the left/right recv
> >> > > buffers.  That also needs to be done within an RMA epoch, if you
> >> > > are
> >> > > reading non-local data.
> >> > >
> >> > >  ~Jim.
> >> > >
> >> > >
> >> > >
> >> > > On Fri, Feb 21, 2014 at 6:07 AM, Rolf Rabenseifner <
> >> > > rabenseifner at hlrs.de > wrote:
> >> > >
> >> > >
> >> > > Dear member of the RMA group and especially the mpich developers,
> >> > >
> >> > > I have real problems with the new shared memory in MPI-3.0,
> >> > > i.e., the load/stores together with the RMA synchronization
> >> > > causes wrong execution results.
> >> > >
> >> > > The attached
> >> > >     1sided_halo_C_mpich_problems_rabenseifner.tar.gz or .zip
> >> > > contains
> >> > >
> >> > > - 1sided/halo_1sided_put_win_alloc.c
> >> > >
> >> > >   The basis that works. It uses MPI_Put and MPI_Win_fence for
> >> > >   duplex left/right halo communication.
> >> > >
> >> > > - 1sided/halo_1sided_store_win_alloc_shared.c
> >> > >
> >> > >    This is the same, but a shared memory window is used and
> >> > >    the MPU_Put is substituted by storing the data in the
> >> > >    neighbors window. Same MPI_Win_fence with same assertions.
> >> > >
> >> > >    This does not work, although I'm sure that my assertions are
> >> > > correct.
> >> > >
> >> > >    Known possibilities:
> >> > >    - I'm wrong and was not able to understand the assertions
> >> > >      on MPI-3.0 p452:8-19.
> >> > >    - I'm wrong because it is invalid to use the MPI_Win_fence
> >> > >      together with the shared memory windows.
> >> > >    - mpich has a bug.
> >> > >    (The first two possibilities are the reason, why I use this
> >> > >     Forum email list)
> >> > >
> >> > > - 1sided/halo_1sided_store_win_alloc_shared_w-a-cray.c
> >> > >
> >> > >    This is a work-around-for Cray that works on our Cray
> >> > >    and does not use MPI_MODE_NOPRECEDE and MPI_MODE_NOSUCCEED.
> >> > >    It also runs on another mpich installation.
> >> > >
> >> > > - 1sided/halo_1sided_store_win_alloc_shared_pscw.c
> >> > >
> >> > >    Here, MPI_Win_fence is substituted by Post-Start-Complete-Wait
> >> > >    and it does not work for any assertions.
> >> > >
> >> > >    Same possibilities as above.
> >> > >
> >> > > - 1sided/halo_1sided_store_win_alloc_shared_query.c
> >> > > - 1sided/halo_1sided_store_win_alloc_shared_query_w-a-cray.c
> >> > >
> >> > >    Same as halo_1sided_store_win_alloc_shared.c
> >> > >    but non-contigues windows are used.
> >> > >    Same problems as above.
> >> > >
> >> > > - 1sided/halo_1sided_store_win_alloc_shared_othersync.c
> >> > >
> >> > >    This version uses the synchronization according to
> >> > >    #413 and it is tested and works on two platforms.
> >> > >
> >> > > Best regards
> >> > > Rolf
> >> > >
> >> > > --
> >> > > Dr. Rolf Rabenseifner . . . . . . . . . .. email
> >> > > rabenseifner at hlrs.de
> >> > > High Performance Computing Center (HLRS) . phone
> >> > > ++49(0)711/685-65530
> >> > > University of Stuttgart . . . . . . . . .. fax ++49(0)711 /
> >> > > 685-65832
> >> > > Head of Dpmt Parallel Computing . . .
> >> > > www.hlrs.de/people/rabenseifner
> >> > > Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room
> >> > > 1.307)
> >> > >
> >> > > _______________________________________________
> >> > > mpiwg-rma mailing list
> >> > > mpiwg-rma at lists.mpi-forum.org
> >> > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> >> > >
> >> > >
> >> > > _______________________________________________
> >> > > mpiwg-rma mailing list
> >> > > mpiwg-rma at lists.mpi-forum.org
> >> > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> >> >
> >> > --
> >> > Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
> >> > High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
> >> > University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
> >> > Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
> >> > Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307)
> >> >
> >> > _______________________________________________
> >> > mpiwg-rma mailing list
> >> > mpiwg-rma at lists.mpi-forum.org
> >> > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> >> >
> >> >
> >> > _______________________________________________
> >> > mpiwg-rma mailing list
> >> > mpiwg-rma at lists.mpi-forum.org
> >> > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> >>
> >> --
> >> Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
> >> High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
> >> University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
> >> Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
> >> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307)
> >> _______________________________________________
> >> mpiwg-rma mailing list
> >> mpiwg-rma at lists.mpi-forum.org
> >> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> >>
> >> _______________________________________________
> >> mpiwg-rma mailing list
> >> mpiwg-rma at lists.mpi-forum.org
> >> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> >
> > _______________________________________________
> > mpiwg-rma mailing list
> > mpiwg-rma at lists.mpi-forum.org
> > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>
>
>
> --
> Jeff Hammond
> jeff.science at gmail.com
> http://jeffhammond.github.io/
> _______________________________________________
> mpiwg-rma mailing list
> mpiwg-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-rma/attachments/20140601/47ad981f/attachment.html>


More information about the mpiwg-rma mailing list