[mpiwg-rma] FENCE local completion requirements

Rajeev Thakur thakur at mcs.anl.gov
Mon Sep 29 20:01:35 CDT 2014


> In one of the projects here, we are using multiple overlapping windows (i.e., we have more than one window handle for the same set of buffers).  In this case, the following program would be incorrect:
> 
> MPI_WIN_FENCE(win1)
> MPI_PUT(<something>, remote_offset, win1)
> MPI_WIN_FENCE(win1)
> 
> MPI_WIN_FENCE(win2)
> MPI_GET(<something>, remote_offset, win2)
> MPI_WIN_FENCE(win2)
> 
> Since there is no remote completion on win1, the GET might not get the same data as what was put on win1.  I’ll need to add extra synchronization here:

No need for extra synchronization. MPI_Get cannot access the target memory until the target has called Win_fence(win2), which means the target has already called Win_fence(win1).

Rajeev


On Sep 29, 2014, at 5:15 PM, "Balaji, Pavan" <balaji at anl.gov> wrote:

> Hello,
> 
> The MPI-3 RMA chapter says the following (pg. 440, lines 45-48 and pg. 441, lines 34-35):
> 
> “All RMA operations on win originating at a given process and started before the fence call will complete at that process before the fence call returns.  They will be completed at their target before the fence call returns at the target.  RMA operations on win started by a process after the fence call returns will access their target window only after MPI_WIN_FENCE has been called by the target process.”
> 
> This seems to imply that:
> 
> 1. The FENCE call only implies local completion for a given process.
> 
> 2. Operations issued after the local FENCE need to figure out whether the target has completed the FENCE operation.
> 
> Does anyone remember why the standard was written this way?  If the MPI implementation uses hardware RDMA, it has to enforce remote completion anyway (in order to tell the target process that data issued to it has completed).  Even if the MPI implementation uses strictly active messages, we either need to enforce remote completion or RMA operations after the FENCE have to check in their AM handlers if the target has completed the previous FENCE (or queue up potentially many RMA operations if this has not happened yet).
> 
> The only case I can think of whether this weaker semantics would benefit is if the FENCE operation locally queues up the issued RMA operations and implements them as an Alltoall-style collective internally.  I guess this is valid implementation choice, though it’s unclear which implementations do this.
> 
> Now, the question is what are we losing by this weaker semantics —
> 
> In one of the projects here, we are using multiple overlapping windows (i.e., we have more than one window handle for the same set of buffers).  In this case, the following program would be incorrect:
> 
> MPI_WIN_FENCE(win1)
> MPI_PUT(<something>, remote_offset, win1)
> MPI_WIN_FENCE(win1)
> 
> MPI_WIN_FENCE(win2)
> MPI_GET(<something>, remote_offset, win2)
> MPI_WIN_FENCE(win2)
> 
> Since there is no remote completion on win1, the GET might not get the same data as what was put on win1.  I’ll need to add extra synchronization here:
> 
> MPI_WIN_FENCE(win1)
> MPI_PUT(<something>, remote_offset, win1)
> MPI_WIN_FENCE(win1)
> 
> MPI_BARRIER()
> 
> MPI_WIN_FENCE(win2)
> MPI_GET(<something>, remote_offset, win2)
> MPI_WIN_FENCE(win2)
> 
> This seems unnecessary for most implementations since they already do a synchronization.
> 
> In short: (1) the standard gives weak semantics hoping the implementation can take advantage of it, (2) most implementations are not taking advantage of it due to the use of hardware RDMA or otherwise, and (3) it’s creating annoying synchronization requirements for users.
> 
> Is there some way we can improve this situation?  I’m not advocating making the synchronization requirements stronger in the standard.  But if there’s a way for the user to either request for a strong synchronization or if there’s a way for the implementation to expose this to the user, it might be more beneficial.
> 
> One option is make the default the stronger semantics and provide an assert for the weaker semantics.
> 
> Thoughts?
> 
>  — Pavan
> 
> --
> Pavan Balaji  ✉️
> http://www.mcs.anl.gov/~balaji
> 
> _______________________________________________
> mpiwg-rma mailing list
> mpiwg-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma




More information about the mpiwg-rma mailing list