[mpiwg-rma] Ticket 435 and Re: MPI_Win_allocate_shared and synchronization functions

Fri Jul 4 15:49:00 CDT 2014

Pavan, 

> Ticket 435 should be cleaned up ...

I would recommend to substitute in ticket #435 the text

  Ticket #434 proposes to require some sort of synchronization 
  by adding the following additional rule after the 6 rules on page 454:  
    7.An RMA operation issued at the origin after MPI_WIN_START 
    or MPI_WIN_FENCE to a specific target, accesses the public 
    window copy at the target that is available after the matching 
    MPI_WIN_POST or MPI_WIN_FENCE at the target. 
  This, however, only impacts RMA operations, but not load/store accesses on shared memory windows.

by the following new text:

  MPI-3.0 p441:34-35 defines

    "RMA operations on win started by a process after the
    fence call returns will access their target window only
    after MPI_WIN_FENCE has been called by the target process." 

  If a remote load/store on shared memory is not treated as an 
  RMA operation, the fence will not synchronize a sender process
  issuing local store before fence and a receiver process issuing
  a remote load to the same memory location after the fence.

  MPI3.0 p442:28-33 defines MPI_Win_start:

    "Starts an RMA access epoch for win. RMA calls issued on
    win during this epoch must access only windows at processes
    in group. Each process in group must issue a matching
    call to MPI_WIN_POST. RMA accesses to each target 
    window will be delayed, if necessary, until the target
    process executed the matching call to MPI_WIN_POST."    

  If a remote load/store on shared memory is not treated as an 
  RMA operation, then remote load/store are not valid
  between MPI_Win_start and MPI_Win_complete, and
  the post-start syncronization will not synchronize a sender process
  issuing a local store before post and receiver process issuing
  a remote load to the same memory Location after the start operation.

  MPI-3.0 p453:44 - p454.3 rules 2 and 3:

    "2. If an RMA operation is completed at the origin by a 
    call to MPI_WIN_FENCE then the operation is completed at
    the target by the matching call to MPI_WIN_FENCE by
    the target process.

    3. If an RMA operation is completed at the origin by a 
    call to MPI_WIN_COMPLETE then the operation is completed
    at the target by the matching call to MPI_WIN_WAIT
    by the target process."

  If a remote load/store on shared memory is not treated as an 
  RMA operation, then a remote store before fence or complete
  at a sender process will not be synchronized with a local 
  load after the matching fence or wait at the receiver process.

  Such synchronizing behavior of remote and local load/stores
  on shared Memory windows was expected in the paper published
  at EuroMPI 2012 by several members of the RMA WG:
  Torsten Hoefler, James Dinan, Darius Buntinas, Pavan Balaji, 
  Brian Barrett, Ron Brightwell, William Gropp, Vivek Kale, Rajeev Thakur:
  "MPI + MPI: a new hybrid approach to parallel programming with 
  MPI plus shared Memory".

  There are two options to fix this problem: 

  A) To define that remote load and store on a shared 
     memory window is treated as an RMA operation.

     This would imply that all one-sided sync primitives
     must explicit synchronize.

  B) To define that a remote and a local store or load 
     are treated not as RMA Operation and to explicitely
     define additional process-synchronization behavior of 
     some one-sided sync routines.

  The proposal below is based on B) and currently restricted to
  fence and post-start-complete-wait doing such process-to-process
  synchronization on shared memory windows. 

Additionally we need to modify the text to move it into the function
definitions of MPI_Win_start, MPI_Win_wait, and MPI_Fence.

But before this, the RMA Group should discuss A or B.

Best regards
Rolf 

----- Original Message -----
> From: "Rajeev Thakur" <thakur at mcs.anl.gov>
> To: "MPI WG Remote Memory Access working group" <mpiwg-rma at lists.mpi-forum.org>
> Cc: "Torsten Hoefler" <htor at inf.ethz.ch>, "Bill Gropp" <wgropp at uiuc.edu>
> Sent: Wednesday, July 2, 2014 9:30:23 PM
> Subject: Re: [mpiwg-rma] Ticket 435 and Re: MPI_Win_allocate_shared and synchronization functions
> 
> Ticket 435 should be cleaned up to reflect the corrections pointed
> out in 434, so that we can focus specifically on the problem of
> direct load/stores to shared memory.
> 
> Rajeev
> 
> On Jul 2, 2014, at 9:38 AM, Rolf Rabenseifner <rabenseifner at hlrs.de>
> wrote:
> 
> > Bill, Rajeev, and all other RMA WG members,
> > 
> > Hubert and Torsten discussed already in 2012 the meaning of
> > the MPI one-sided synchronization routines for MPI-3.0 shared
> > memory.
> > 
> > This question is still unresolved in the MPI-3.0 + errata.
> > 
> > Does the term "RMA Operation" include "a remote load/store
> > from an origin process to the window Memory on a target"?
> > 
> > Or not?
> > 
> > The ticket #435 expects "not".
> > 
> > In this case, MPI_Win_fence and post-start-complete-wait
> > cannot be used for synchronizing the sending process of data
> > with the receiving process of data that use only
> > local and remote load/stores on shared memory windows.
> > 
> > Ticket 435 extends the meaning of
> > MPI_Win_fence and post-start-complete-wait that they
> > provide sender-Receiver synchronization between processes
> > that use local and remote load/stores
> > on shared memory windows.
> > 
> > I hope that all RMA working group members, agree that
> > - currently the behavior of these sync-routines for
> >   shared memory remote load/stores is undefined due to
> >   the undefined definition of "RMA Operation"
> > - and that we need an errata that resolves this problem.
> > 
> > What is your opinion about the solution provided in
> > https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/435 ?
> > 
> > Best regards
> > Rolf
> > 
> > PS: Ticket 435 is the result of a discussion of Pavan, Hubert and I
> > at ISC2014.
> > 
> > ----- Original Message -----
> >> From: Hubert Ritzdorf
> >> Sent: Tuesday, September 11, 2012 7:26 PM
> >> To: mpi3-rma at lists.mpi-forum.org
> >> Subject: MPI_Win_allocate_shared and synchronization functions
> >> 
> >> Hi,
> >> 
> >> it's quite unclear what Page 410, Lines 17-19
> >> 
> >> A consistent view can be created in the uni?fied
> >> memory model (see Section 11.4) by utilizing the window
> >> synchronization functions (see
> >> Section 11.5)
> >> 
> >> really means. Section 11.5 doesn't mention any (load/store) access
> >> to
> >> shared memory.
> >> Thus, must
> >> 
> >> (*) RMA communication calls and RMA operations
> >>     be interpreted   as RMA communication calls (MPI_GET, MPI_PUT,
> >> ...) and
> >>                                ANY load/store access to shared
> >> window
> >> (*) put call             as put call and any store to shared
> >> memory
> >> (*) get call             as get call and any load from shared
> >> memory
> >> (*) accumulate call as accumulate call and any load or store
> >> access
> >> to shared window ?
> >> 
> >> Example: Assertion MPI_MODE_NOPRECEDE
> >> 
> >> Does
> >> 
> >> the fence does not complete any sequence of locally issued RMA
> >> calls
> >> 
> >> mean for windows created by MPI_Win_Allocate_shared ()
> >> 
> >> the fence does not complete any sequence of locally issued RMA
> >> calls
> >> or
> >> any load/store access to the window memory ?
> >> 
> >> It's not clear to me. I will be probably not clear for the
> >> standard
> >> MPI user.
> >> RMA operations are defined only MPI functions for window objects
> >> (as far as I can see).
> >> But possibly I'm totally wrong and the synchronization functions
> >> synchronize
> >> only the RMA communication calls (MPI_GET, MPI_PUT, ...).
> >> 
> >> Hubert
> >> 
> >> -----------------------------------------------------------------------------
> >> 
> >> Wednesday, September 12, 2012 11:37 AM
> >> 
> >> Hubert,
> >> 
> >> This is what I was referring to. I'm in favor of this proposal.
> >> 
> >> Torsten
> >> 
> > 
> > --
> > Dr. Rolf Rabenseifner . . . . . . . . . .. email
> > rabenseifner at hlrs.de
> > High Performance Computing Center (HLRS) . phone
> > ++49(0)711/685-65530
> > University of Stuttgart . . . . . . . . .. fax ++49(0)711 /
> > 685-65832
> > Head of Dpmt Parallel Computing . . .
> > www.hlrs.de/people/rabenseifner
> > Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room
> > 1.307)
> > _______________________________________________
> > mpiwg-rma mailing list
> > mpiwg-rma at lists.mpi-forum.org
> > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> 
> _______________________________________________
> mpiwg-rma mailing list
> mpiwg-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> 

-- 
Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307)