[mpiwg-rma] Problems with RMA synchronization in combination with load/store shared memory accesses

William Gropp wgropp at illinois.edu
Sun Jun 1 13:58:04 CDT 2014


We can always do errata.  

Bill

William Gropp
Director, Parallel Computing Institute
Thomas M. Siebel Chair in Computer Science
University of Illinois Urbana-Champaign





On Jun 1, 2014, at 8:51 PM, Jim Dinan wrote:

> I tend to agree with Jeff.  On some architectures different operations are requires to make my operations visible to others versus making operations performed by others visible to me.
> 
> Is this meeting the last call for errata, or is it the September meeting?
> 
>  ~Jim.
> 
> 
> On Sat, May 31, 2014 at 4:44 PM, Jeff Hammond <jeff.science at gmail.com> wrote:
> Remote load-store cannot be treated like local load-store from a
> sequential consistency perspective.  If a process does local
> load-store, it is likely that no memory barrier will be required to
> see a consistent view of memory.  When another process does
> load-store, this changes dramatically.
> 
> Jeff
> 
> On Sat, May 31, 2014 at 3:31 PM, Rajeev Thakur <thakur at mcs.anl.gov> wrote:
> > I think before ticket 429 (https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/429) is put up for a vote as errata, the RMA working group needs to decide whether remote loads/stores to shared memory windows are treated as local loads and stores or as put/get operations (for the purpose of the assert definitions). The text will be different depending on that.
> >
> > If remote loads/stores to shared memory windows are considered as local loads/stores they will be covered under MPI_MODE_NOSTORE; if considered as put/get operations, they will be covered under MPI_MODE_NOPRECEDE, MPI_MODE_NOSUCCEED, and MPI_MODE_NOPUT.
> >
> > Ticket 429 says they should be considered as local loads/stores.
> >
> > Rajeev
> >
> >
> > On May 27, 2014, at 1:25 PM, Jim Dinan <james.dinan at gmail.com> wrote:
> >
> >> Hi Rolf,
> >>
> >> MPI_MODE_NOSTORE applies to local updates that should be made visible to other processes following the end of the access epoch.  I believe that visibility of updates made by other processes were intended to be incorporated into the NOPRECEDE/NOSUCCEED assertions.  I think that Hubert's proposal may be the right approach -- that remote load/store accesses to the shared memory window should be treated as "RMA" (e.g. analogous to get/put) operations.
> >>
> >>  ~Jim.
> >>
> >>
> >> On Mon, May 19, 2014 at 1:16 PM, Rolf Rabenseifner <rabenseifner at hlrs.de> wrote:
> >> Jim and RMA WG,
> >>
> >> There are now two questions:
> >>
> >> Jim asked:
> >> > Question to WG: Do we need to update the fence assertions to better
> >> > define interaction with local load/store accesses and remote stores?
> >> >
> >>
> >> Rolf asked:
> >> > Additionally, I would recommend that we add after MPI-3.0 p451:33
> >> >
> >> >   Note that in shared memory windows (allocated with
> >> >   MPI_WIN_ALLOCATE_SHARED), there is no difference
> >> >   between remote store accesses and local store accesses
> >> >   to the window.
> >> >
> >> > This would help to understand that "the local window
> >> > was not updated by stores" does not mean "by local stores",
> >> > see p452:1 and p452:9.
> >>
> >> For me, it is important to understand the meaning of the
> >> current assertions if they are used in a shared memory window.
> >> Therefore my proposal above as erratum to MPI-3.0.
> >>
> >> In MPI-3.1 and 4.0, you may want to add additional assertions.
> >>
> >> Your analysis below, will also show that mpich implements
> >> Post-Start-Complete-Wait synchronization in a wrong way,
> >> if there are no calls to RMA routines.
> >>
> >> Best regards
> >> Rolf
> >>
> >> ----- Original Message -----
> >> > From: "Jim Dinan" <james.dinan at gmail.com>
> >> > To: "MPI WG Remote Memory Access working group" <mpiwg-rma at lists.mpi-forum.org>
> >> > Sent: Thursday, May 15, 2014 4:06:08 PM
> >> > Subject: Re: [mpiwg-rma] Problems with RMA synchronization in combination with load/store shared memory accesses
> >> >
> >> >
> >> >
> >> > Rolf,
> >> >
> >> >
> >> > Here is an attempt to simplify your example for discussion.  Given a
> >> > shared memory window, shr_mem_win, with buffer, shr_mem_buf:
> >> >
> >> > MPI_Win_fence(MPI_MODE_NOSTORE | MPI_MODE_NOPUT | MPI_MODE_NOPRECEDE
> >> > | MPI_MODE_NOSUCCEED, shr_mem_win );
> >> >
> >> > shr_mem_buf[...] = ...;
> >> >
> >> > MPI_Win_fence(MPI_MODE_NOPUT | MPI_MODE_NOPRECEDE |
> >> > MPI_MODE_NOSUCCEED, shr_mem_win);
> >> >
> >> >
> >> > Right now, Fence assertions don't say anything special about shared
> >> > memory windows:
> >> >
> >> >
> >> > Inline image 1
> >> >
> >> >
> >> > NOPRECEDE/SUCCEED are defined in terms of MPI RMA function calls, and
> >> > do not cover load/store.  Thus, Rolf's usage appears to be correct
> >> > per the current text.  In the MPICH fence implementation,
> >> > src/mpid/ch3/src/ch3u_rma_sync.c:935 we have:
> >> >
> >> > if (!(assert & MPI_MODE_NOSUCCEED)) win_ptr->fence_issued = 1;
> >> >
> >> > Because of this check, we don't actually start an active target epoch
> >> > on the first fence in the example above.  On the second fence, we
> >> > therefore don't perform the necessary synchronization, leading to
> >> > incorrect output in Rolf's example.
> >> >
> >> >
> >> > Question to WG: Do we need to update the fence assertions to better
> >> > define interaction with local load/store accesses and remote stores?
> >> >
> >> >
> >> > If not, then Rolf's code is correct and we need to modify the check
> >> > above in MPICH to something like:
> >> >
> >> >
> >> > if (!(assert & MPI_MODE_NOSUCCEED) || win_ptr->create_ flavor  ==
> >> > MPI_WIN_FLAVOR_SHARED )
> >> >   win_ptr->fence_issued = 1;
> >> >
> >> >
> >> >  ~Jim.
> >> >
> >> >
> >> >
> >> >
> >> > On Tue, Apr 8, 2014 at 12:02 PM, Rolf Rabenseifner <
> >> > rabenseifner at hlrs.de > wrote:
> >> >
> >> >
> >> > Jim,
> >> >
> >> > I'm now sure, that mpich has a bug with assertions on shared memory
> >> > windows.
> >> >
> >> > in the example, rcv_buf_left and rcv_buf_right are the windows.
> >> > the only accesses to these rcv_buf_... are stores from remote
> >> > and fully local loads.
> >> > Both accesses are done in different epochs surrounded by
> >> > MPI_Win_fence.
> >> >
> >> > According to your interpretation (which is really okay)
> >> > all fences can use all possible assertions (!!!),
> >> > except after the remote stores, MPI_MODE_NOSTORE cannot be used.
> >> >
> >> > I updated the example and mpich is executing it wrong.
> >> >
> >> > Please check it yourself on your installation:
> >> > halo_1sided_store_win_alloc_shared_w-a-2-cray.c
> >> >
> >> > Without the assertions, all works:
> >> > halo_1sided_store_win_alloc_shared_w-a-2NO-cray.c
> >> >
> >> > Could you verify that mpich has a bug?
> >> >
> >> > Additionally, I would recommend that we add after MPI-3.0 p451:33
> >> >
> >> >   Note that in shared memory windows (allocated with
> >> >   MPI_WIN_ALLOCATE_SHARED), there is no difference
> >> >   between remote store accesses and local store accesses
> >> >   to the window.
> >> >
> >> > This would help to understand that "the local window
> >> > was not updated by stores" does not mean "by local stores",
> >> > see p452:1 and p452:9.
> >> >
> >> > Is it a good idea?
> >> >
> >> > Best regards
> >> > Rolf
> >> >
> >> >
> >> >
> >> > ----- Original Message -----
> >> > > From: "Jim Dinan" < james.dinan at gmail.com >
> >> > > To: "MPI WG Remote Memory Access working group" <
> >> > > mpiwg-rma at lists.mpi-forum.org >
> >> > > Sent: Friday, March 21, 2014 8:14:22 PM
> >> > > Subject: Re: [mpiwg-rma] Problems with RMA synchronization in
> >> > > combination with load/store shared memory accesses
> >> > >
> >> > >
> >> > >
> >> >
> >> > > Rolf,
> >> > >
> >> > >
> >> > > This line is incorrect: MPI_Win_fence(MPI_MODE_NOSTORE +
> >> > > MPI_MODE_NOPRECEDE, win_ rcv_buf _left );
> >> >
> >> >
> >> > >
> >> > >
> >> > > You need to do a bitwise OR of the assertions (MPI_MODE_NOSTORE |
> >> > > MPI_MODE_NOPRECEDE).
> >> > >
> >> > > In halo_1sided_store_win_alloc_shared.c, you are doing stores
> >> > > within
> >> > > the epoch, so MPI_MODE_NOSTORE looks like an incorrect assertion on
> >> > > the closing fence.
> >> > >
> >> > > Following the Fence epoch, you are reading from the left/right recv
> >> > > buffers.  That also needs to be done within an RMA epoch, if you
> >> > > are
> >> > > reading non-local data.
> >> > >
> >> > >  ~Jim.
> >> > >
> >> > >
> >> > >
> >> > > On Fri, Feb 21, 2014 at 6:07 AM, Rolf Rabenseifner <
> >> > > rabenseifner at hlrs.de > wrote:
> >> > >
> >> > >
> >> > > Dear member of the RMA group and especially the mpich developers,
> >> > >
> >> > > I have real problems with the new shared memory in MPI-3.0,
> >> > > i.e., the load/stores together with the RMA synchronization
> >> > > causes wrong execution results.
> >> > >
> >> > > The attached
> >> > >     1sided_halo_C_mpich_problems_rabenseifner.tar.gz or .zip
> >> > > contains
> >> > >
> >> > > - 1sided/halo_1sided_put_win_alloc.c
> >> > >
> >> > >   The basis that works. It uses MPI_Put and MPI_Win_fence for
> >> > >   duplex left/right halo communication.
> >> > >
> >> > > - 1sided/halo_1sided_store_win_alloc_shared.c
> >> > >
> >> > >    This is the same, but a shared memory window is used and
> >> > >    the MPU_Put is substituted by storing the data in the
> >> > >    neighbors window. Same MPI_Win_fence with same assertions.
> >> > >
> >> > >    This does not work, although I'm sure that my assertions are
> >> > > correct.
> >> > >
> >> > >    Known possibilities:
> >> > >    - I'm wrong and was not able to understand the assertions
> >> > >      on MPI-3.0 p452:8-19.
> >> > >    - I'm wrong because it is invalid to use the MPI_Win_fence
> >> > >      together with the shared memory windows.
> >> > >    - mpich has a bug.
> >> > >    (The first two possibilities are the reason, why I use this
> >> > >     Forum email list)
> >> > >
> >> > > - 1sided/halo_1sided_store_win_alloc_shared_w-a-cray.c
> >> > >
> >> > >    This is a work-around-for Cray that works on our Cray
> >> > >    and does not use MPI_MODE_NOPRECEDE and MPI_MODE_NOSUCCEED.
> >> > >    It also runs on another mpich installation.
> >> > >
> >> > > - 1sided/halo_1sided_store_win_alloc_shared_pscw.c
> >> > >
> >> > >    Here, MPI_Win_fence is substituted by Post-Start-Complete-Wait
> >> > >    and it does not work for any assertions.
> >> > >
> >> > >    Same possibilities as above.
> >> > >
> >> > > - 1sided/halo_1sided_store_win_alloc_shared_query.c
> >> > > - 1sided/halo_1sided_store_win_alloc_shared_query_w-a-cray.c
> >> > >
> >> > >    Same as halo_1sided_store_win_alloc_shared.c
> >> > >    but non-contigues windows are used.
> >> > >    Same problems as above.
> >> > >
> >> > > - 1sided/halo_1sided_store_win_alloc_shared_othersync.c
> >> > >
> >> > >    This version uses the synchronization according to
> >> > >    #413 and it is tested and works on two platforms.
> >> > >
> >> > > Best regards
> >> > > Rolf
> >> > >
> >> > > --
> >> > > Dr. Rolf Rabenseifner . . . . . . . . . .. email
> >> > > rabenseifner at hlrs.de
> >> > > High Performance Computing Center (HLRS) . phone
> >> > > ++49(0)711/685-65530
> >> > > University of Stuttgart . . . . . . . . .. fax ++49(0)711 /
> >> > > 685-65832
> >> > > Head of Dpmt Parallel Computing . . .
> >> > > www.hlrs.de/people/rabenseifner
> >> > > Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room
> >> > > 1.307)
> >> > >
> >> > > _______________________________________________
> >> > > mpiwg-rma mailing list
> >> > > mpiwg-rma at lists.mpi-forum.org
> >> > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> >> > >
> >> > >
> >> > > _______________________________________________
> >> > > mpiwg-rma mailing list
> >> > > mpiwg-rma at lists.mpi-forum.org
> >> > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> >> >
> >> > --
> >> > Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
> >> > High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
> >> > University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
> >> > Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
> >> > Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307)
> >> >
> >> > _______________________________________________
> >> > mpiwg-rma mailing list
> >> > mpiwg-rma at lists.mpi-forum.org
> >> > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> >> >
> >> >
> >> > _______________________________________________
> >> > mpiwg-rma mailing list
> >> > mpiwg-rma at lists.mpi-forum.org
> >> > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> >>
> >> --
> >> Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
> >> High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
> >> University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
> >> Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
> >> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307)
> >> _______________________________________________
> >> mpiwg-rma mailing list
> >> mpiwg-rma at lists.mpi-forum.org
> >> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> >>
> >> _______________________________________________
> >> mpiwg-rma mailing list
> >> mpiwg-rma at lists.mpi-forum.org
> >> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> >
> > _______________________________________________
> > mpiwg-rma mailing list
> > mpiwg-rma at lists.mpi-forum.org
> > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> 
> 
> 
> --
> Jeff Hammond
> jeff.science at gmail.com
> http://jeffhammond.github.io/
> _______________________________________________
> mpiwg-rma mailing list
> mpiwg-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> 
> _______________________________________________
> mpiwg-rma mailing list
> mpiwg-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-rma/attachments/20140601/f0e4360a/attachment-0001.html>


More information about the mpiwg-rma mailing list