[mpiwg-rma] Problems with RMA synchronization in combination with load/store shared memory accesses
Rolf Rabenseifner
rabenseifner at hlrs.de
Mon May 19 12:16:44 CDT 2014
Jim and RMA WG,
There are now two questions:
Jim asked:
> Question to WG: Do we need to update the fence assertions to better
> define interaction with local load/store accesses and remote stores?
>
Rolf asked:
> Additionally, I would recommend that we add after MPI-3.0 p451:33
>
> Note that in shared memory windows (allocated with
> MPI_WIN_ALLOCATE_SHARED), there is no difference
> between remote store accesses and local store accesses
> to the window.
>
> This would help to understand that "the local window
> was not updated by stores" does not mean "by local stores",
> see p452:1 and p452:9.
For me, it is important to understand the meaning of the
current assertions if they are used in a shared memory window.
Therefore my proposal above as erratum to MPI-3.0.
In MPI-3.1 and 4.0, you may want to add additional assertions.
Your analysis below, will also show that mpich implements
Post-Start-Complete-Wait synchronization in a wrong way,
if there are no calls to RMA routines.
Best regards
Rolf
----- Original Message -----
> From: "Jim Dinan" <james.dinan at gmail.com>
> To: "MPI WG Remote Memory Access working group" <mpiwg-rma at lists.mpi-forum.org>
> Sent: Thursday, May 15, 2014 4:06:08 PM
> Subject: Re: [mpiwg-rma] Problems with RMA synchronization in combination with load/store shared memory accesses
>
>
>
> Rolf,
>
>
> Here is an attempt to simplify your example for discussion. Given a
> shared memory window, shr_mem_win, with buffer, shr_mem_buf:
>
> MPI_Win_fence(MPI_MODE_NOSTORE | MPI_MODE_NOPUT | MPI_MODE_NOPRECEDE
> | MPI_MODE_NOSUCCEED, shr_mem_win );
>
> shr_mem_buf[...] = ...;
>
> MPI_Win_fence(MPI_MODE_NOPUT | MPI_MODE_NOPRECEDE |
> MPI_MODE_NOSUCCEED, shr_mem_win);
>
>
> Right now, Fence assertions don't say anything special about shared
> memory windows:
>
>
> Inline image 1
>
>
> NOPRECEDE/SUCCEED are defined in terms of MPI RMA function calls, and
> do not cover load/store. Thus, Rolf's usage appears to be correct
> per the current text. In the MPICH fence implementation,
> src/mpid/ch3/src/ch3u_rma_sync.c:935 we have:
>
> if (!(assert & MPI_MODE_NOSUCCEED)) win_ptr->fence_issued = 1;
>
> Because of this check, we don't actually start an active target epoch
> on the first fence in the example above. On the second fence, we
> therefore don't perform the necessary synchronization, leading to
> incorrect output in Rolf's example.
>
>
> Question to WG: Do we need to update the fence assertions to better
> define interaction with local load/store accesses and remote stores?
>
>
> If not, then Rolf's code is correct and we need to modify the check
> above in MPICH to something like:
>
>
> if (!(assert & MPI_MODE_NOSUCCEED) || win_ptr->create_ flavor ==
> MPI_WIN_FLAVOR_SHARED )
> win_ptr->fence_issued = 1;
>
>
> ~Jim.
>
>
>
>
> On Tue, Apr 8, 2014 at 12:02 PM, Rolf Rabenseifner <
> rabenseifner at hlrs.de > wrote:
>
>
> Jim,
>
> I'm now sure, that mpich has a bug with assertions on shared memory
> windows.
>
> in the example, rcv_buf_left and rcv_buf_right are the windows.
> the only accesses to these rcv_buf_... are stores from remote
> and fully local loads.
> Both accesses are done in different epochs surrounded by
> MPI_Win_fence.
>
> According to your interpretation (which is really okay)
> all fences can use all possible assertions (!!!),
> except after the remote stores, MPI_MODE_NOSTORE cannot be used.
>
> I updated the example and mpich is executing it wrong.
>
> Please check it yourself on your installation:
> halo_1sided_store_win_alloc_shared_w-a-2-cray.c
>
> Without the assertions, all works:
> halo_1sided_store_win_alloc_shared_w-a-2NO-cray.c
>
> Could you verify that mpich has a bug?
>
> Additionally, I would recommend that we add after MPI-3.0 p451:33
>
> Note that in shared memory windows (allocated with
> MPI_WIN_ALLOCATE_SHARED), there is no difference
> between remote store accesses and local store accesses
> to the window.
>
> This would help to understand that "the local window
> was not updated by stores" does not mean "by local stores",
> see p452:1 and p452:9.
>
> Is it a good idea?
>
> Best regards
> Rolf
>
>
>
> ----- Original Message -----
> > From: "Jim Dinan" < james.dinan at gmail.com >
> > To: "MPI WG Remote Memory Access working group" <
> > mpiwg-rma at lists.mpi-forum.org >
> > Sent: Friday, March 21, 2014 8:14:22 PM
> > Subject: Re: [mpiwg-rma] Problems with RMA synchronization in
> > combination with load/store shared memory accesses
> >
> >
> >
>
> > Rolf,
> >
> >
> > This line is incorrect: MPI_Win_fence(MPI_MODE_NOSTORE +
> > MPI_MODE_NOPRECEDE, win_ rcv_buf _left );
>
>
> >
> >
> > You need to do a bitwise OR of the assertions (MPI_MODE_NOSTORE |
> > MPI_MODE_NOPRECEDE).
> >
> > In halo_1sided_store_win_alloc_shared.c, you are doing stores
> > within
> > the epoch, so MPI_MODE_NOSTORE looks like an incorrect assertion on
> > the closing fence.
> >
> > Following the Fence epoch, you are reading from the left/right recv
> > buffers. That also needs to be done within an RMA epoch, if you
> > are
> > reading non-local data.
> >
> > ~Jim.
> >
> >
> >
> > On Fri, Feb 21, 2014 at 6:07 AM, Rolf Rabenseifner <
> > rabenseifner at hlrs.de > wrote:
> >
> >
> > Dear member of the RMA group and especially the mpich developers,
> >
> > I have real problems with the new shared memory in MPI-3.0,
> > i.e., the load/stores together with the RMA synchronization
> > causes wrong execution results.
> >
> > The attached
> > 1sided_halo_C_mpich_problems_rabenseifner.tar.gz or .zip
> > contains
> >
> > - 1sided/halo_1sided_put_win_alloc.c
> >
> > The basis that works. It uses MPI_Put and MPI_Win_fence for
> > duplex left/right halo communication.
> >
> > - 1sided/halo_1sided_store_win_alloc_shared.c
> >
> > This is the same, but a shared memory window is used and
> > the MPU_Put is substituted by storing the data in the
> > neighbors window. Same MPI_Win_fence with same assertions.
> >
> > This does not work, although I'm sure that my assertions are
> > correct.
> >
> > Known possibilities:
> > - I'm wrong and was not able to understand the assertions
> > on MPI-3.0 p452:8-19.
> > - I'm wrong because it is invalid to use the MPI_Win_fence
> > together with the shared memory windows.
> > - mpich has a bug.
> > (The first two possibilities are the reason, why I use this
> > Forum email list)
> >
> > - 1sided/halo_1sided_store_win_alloc_shared_w-a-cray.c
> >
> > This is a work-around-for Cray that works on our Cray
> > and does not use MPI_MODE_NOPRECEDE and MPI_MODE_NOSUCCEED.
> > It also runs on another mpich installation.
> >
> > - 1sided/halo_1sided_store_win_alloc_shared_pscw.c
> >
> > Here, MPI_Win_fence is substituted by Post-Start-Complete-Wait
> > and it does not work for any assertions.
> >
> > Same possibilities as above.
> >
> > - 1sided/halo_1sided_store_win_alloc_shared_query.c
> > - 1sided/halo_1sided_store_win_alloc_shared_query_w-a-cray.c
> >
> > Same as halo_1sided_store_win_alloc_shared.c
> > but non-contigues windows are used.
> > Same problems as above.
> >
> > - 1sided/halo_1sided_store_win_alloc_shared_othersync.c
> >
> > This version uses the synchronization according to
> > #413 and it is tested and works on two platforms.
> >
> > Best regards
> > Rolf
> >
> > --
> > Dr. Rolf Rabenseifner . . . . . . . . . .. email
> > rabenseifner at hlrs.de
> > High Performance Computing Center (HLRS) . phone
> > ++49(0)711/685-65530
> > University of Stuttgart . . . . . . . . .. fax ++49(0)711 /
> > 685-65832
> > Head of Dpmt Parallel Computing . . .
> > www.hlrs.de/people/rabenseifner
> > Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room
> > 1.307)
> >
> > _______________________________________________
> > mpiwg-rma mailing list
> > mpiwg-rma at lists.mpi-forum.org
> > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> >
> >
> > _______________________________________________
> > mpiwg-rma mailing list
> > mpiwg-rma at lists.mpi-forum.org
> > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>
> --
> Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
> High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
> University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
> Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307)
>
> _______________________________________________
> mpiwg-rma mailing list
> mpiwg-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>
>
> _______________________________________________
> mpiwg-rma mailing list
> mpiwg-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
--
Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307)
More information about the mpiwg-rma
mailing list