[mpiwg-rma] Problems with RMA synchronization in combination with load/store shared memory accesses

Rolf Rabenseifner rabenseifner at hlrs.de
Sat Mar 22 16:12:02 CDT 2014


Jim,

thank you for your analysis. Answers see below.

----- Original Message -----
> From: "Jim Dinan" <james.dinan at gmail.com>
> To: "MPI WG Remote Memory Access working group" <mpiwg-rma at lists.mpi-forum.org>
> Sent: Friday, March 21, 2014 8:14:22 PM
> Subject: Re: [mpiwg-rma] Problems with RMA synchronization in combination with load/store shared memory accesses
> 
> 
> 
> Rolf,
> 
> 
> This line is incorrect: MPI_Win_fence(MPI_MODE_NOSTORE +
> MPI_MODE_NOPRECEDE, win_ rcv_buf _left );
> 
> 
> You need to do a bitwise OR of the assertions (MPI_MODE_NOSTORE |
> MPI_MODE_NOPRECEDE).


Yes, you are right, but this cannot be the reasons for the problems 
according to MPI-3.0 p451:28-32. The Fortran versions show the same problems.


> In halo_1sided_store_win_alloc_shared.c, you are doing stores within
> the epoch, so MPI_MODE_NOSTORE looks like an incorrect assertion on
> the closing fence.


Yes, but I do remote stores and no local stores.
But you may be right, because the text p452:9-10 does not require
that the stores are local stores.

When I remember correctly, then removing this hint does not 
produce correct results.
I'll re-check.

> 
> Following the Fence epoch, you are reading from the left/right recv
> buffers.  That also needs to be done within an RMA epoch, if you are
> reading non-local data.

This reading is only on the local recv_buf_left/right.
But it is included in the RMA epoch starting with the second fence
and ending with the first fence in the next loop Iteration.

> 
>  ~Jim.
Best regards
Rolf
> 
> 
> 
> On Fri, Feb 21, 2014 at 6:07 AM, Rolf Rabenseifner <
> rabenseifner at hlrs.de > wrote:
> 
> 
> Dear member of the RMA group and especially the mpich developers,
> 
> I have real problems with the new shared memory in MPI-3.0,
> i.e., the load/stores together with the RMA synchronization
> causes wrong execution results.
> 
> The attached
>     1sided_halo_C_mpich_problems_rabenseifner.tar.gz or .zip
> contains
> 
> - 1sided/halo_1sided_put_win_alloc.c
> 
>   The basis that works. It uses MPI_Put and MPI_Win_fence for
>   duplex left/right halo communication.
> 
> - 1sided/halo_1sided_store_win_alloc_shared.c
> 
>    This is the same, but a shared memory window is used and
>    the MPU_Put is substituted by storing the data in the
>    neighbors window. Same MPI_Win_fence with same assertions.
> 
>    This does not work, although I'm sure that my assertions are
> correct.
> 
>    Known possibilities:
>    - I'm wrong and was not able to understand the assertions
>      on MPI-3.0 p452:8-19.
>    - I'm wrong because it is invalid to use the MPI_Win_fence
>      together with the shared memory windows.
>    - mpich has a bug.
>    (The first two possibilities are the reason, why I use this
>     Forum email list)
> 
> - 1sided/halo_1sided_store_win_alloc_shared_w-a-cray.c
> 
>    This is a work-around-for Cray that works on our Cray
>    and does not use MPI_MODE_NOPRECEDE and MPI_MODE_NOSUCCEED.
>    It also runs on another mpich installation.
> 
> - 1sided/halo_1sided_store_win_alloc_shared_pscw.c
> 
>    Here, MPI_Win_fence is substituted by Post-Start-Complete-Wait
>    and it does not work for any assertions.
> 
>    Same possibilities as above.
> 
> - 1sided/halo_1sided_store_win_alloc_shared_query.c
> - 1sided/halo_1sided_store_win_alloc_shared_query_w-a-cray.c
> 
>    Same as halo_1sided_store_win_alloc_shared.c
>    but non-contigues windows are used.
>    Same problems as above.
> 
> - 1sided/halo_1sided_store_win_alloc_shared_othersync.c
> 
>    This version uses the synchronization according to
>    #413 and it is tested and works on two platforms.
> 
> Best regards
> Rolf
> 
> --
> Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
> High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
> University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
> Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307)
> 
> _______________________________________________
> mpiwg-rma mailing list
> mpiwg-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> 
> 
> _______________________________________________
> mpiwg-rma mailing list
> mpiwg-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma

-- 
Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307)



More information about the mpiwg-rma mailing list