[mpiwg-rma] ticket 456

Sun Aug 31 14:43:55 CDT 2014

Dear Jeff,

answers/comments are below inlined:

----- Original Message -----
> From: "Jeff Hammond" <jeff.science at gmail.com>
> To: "MPI WG Remote Memory Access working group" <mpiwg-rma at lists.mpi-forum.org>
> Sent: Sunday, August 31, 2014 7:42:51 AM
> Subject: Re: [mpiwg-rma] ticket 456
> 
> On Sat, Aug 30, 2014 at 7:49 AM, Rolf Rabenseifner
> <rabenseifner at hlrs.de> wrote:
> > Jeff,
> >
> > great!
> >
> > Several comments:
> >
> > ________
> > About your
> > https://github.com/jeffhammond/HPCInfo/blob/master/mpi/rma/shared-memory-windows/win_fence.c
> >
> > line 48-50 (barrier + 2nd fence) are not needed.
> 
> That's why the Barrier is commented out :-)
> 
> The extra Fence is there because I like to be pedantic and do not
> assume that it is always visible to the programmer that a prior Fence
> call has been made.

If the example is used for testing whether an MPI implementation
fulfills #456, then it's synchronization-calls should be as minimal 
as possible.

> 
> > fence on line 42 may be needed to guarantee that any initialization
> > is finished.
> 
> Line 46 completes the initialization epoch.

What I wanted to say: after line 42, the window has some value
which is visible for all processes and is probably not 42.

> > fence on line 52 seems to be also not needed:
> >  - not according to #456
> >  - not if substituting the store by a MPI_Put.
> > It would be only needed, if the load on line 51 would be a MPI_Get.
> 
> This is why I hate Win_fence.  It was poorly designed.  It has no
> beginning or end.  It would have been so much better to have
> Fence_begin, Fence_end and Fence_middle (or whatever it could have
> been called - it has the same notion of Win_sync of ending and
> beginning an epoch).

I understand, but it is as it is.

> > ________
> > About your
> > https://github.com/jeffhammond/HPCInfo/blob/master/mpi/rma/shared-memory-windows/win_pscw.c
> >
> > Line 17: I would recommend MPI_Abort
> > Lines 48, 50, 58, 59 are not needed.
> > Result:
> >
> > 46     if (rank==0) {
> > 49         *shptr = 42;
> > 52         MPI_Win_post(MPI_GROUP_ONE, 0, shwin);
> > 53         MPI_Win_wait(shwin);
> > 55     } else if (rank==1) {
> > 56         int lint;
> > 61         MPI_Win_start(MPI_GROUP_ZERO, 0, shwin);
> > 62         lint = *rptr;
> > 63         MPI_Win_complete(shwin);
> 
> You may be right.  I never use PSCW and it was hard enough to come up
> with that example as it was.

In https://fs.hlrs.de/projects/par/par_prog_ws/practical/MPI.zip
in subdirectory mpi/course/C/1sided/
in the file halo_1sided_store_win_alloc_shared_pscw.c
you can find a PSCW example that did not worked with Cray's MPI
because it removed internally the needed synchronization 
with Post and Start because there were no RMA calls, or because
any synchonization is delayed to the late execution of the RMA calls.

> > This example would illustrate the write-read-rule of #456
> > (i.e. pattern with variable A)
> >    A=val_1
> >    Sync-to-P1      --> Sync-from-P0
> >                        load(A)
> > with
> >    Sync-to-P1      --> Sync-from-P0
> > being
> > 2. MPI_Win_post    --> MPI_Win_start
> >
> > This example would also work when substituting line 62
> > by MPI_Get.
> >
> > All Patterns in #456 are based on corresponding
> > Patterns with MPI_Get and MPI_Put.
> >
> > If this example does not work with an existing MPI libary,
> > then because it optimizes the synchronization away
> > because there are no RMA calls.
> 
> I explicitly verified that MPICH has the memory barriers required to
> make this example work.  I leave it as an exercise to the reader to
> examine other implementations :-)

If you are doing it as above, i.e., only one PSCW and store before
post and load after start, then you may get some surprise.
I'm on travel without access to "my" Cray.

> > _________
> > About your
> > https://github.com/jeffhammond/HPCInfo/blob/master/mpi/rma/shared-memory-windows/win_sync.c
> >
> > Perfect.
> > This example illustrates again the write-read-rule of #456
> > (i.e. pattern with variable A)
> > with
> >    Sync-to-P1      --> Sync-from-P0
> > being
> > 4. MPI_Win_sync
> >    Any-process-sync-
> >     -from-P0-to-P1 --> Any-process-sync-
> >                         -from-P0-to-P1 4)
> >                        MPI_Win_sync
> >
> > _________
> > About your
> > https://github.com/jeffhammond/HPCInfo/blob/master/mpi/rma/shared-memory-windows/win_lock_exclusive.c
> >
> > Perfect.
> > This example illustrates again the write-read-rule of #456
> > (i.e. pattern with variable A),
> > here the lock/unlock pattern with exclusive lock in P1.
> >
> > _________
> > About your
> > https://github.com/jeffhammond/HPCInfo/blob/master/mpi/rma/shared-memory-windows/win_lock_shared.c
> >
> > I would expect that it works,
> > but it is not based on a pattern that currently exists in #456.
> >
> > This is a bug of #456.
> >
> > The current wording
> >
> >   Patterns with lock/unlock synchronization:
> >
> >   Within passive target communication, two locks L1 and L2
> >   may be scheduled L1 before L2 or L2 before L1. In the
> >   following patterns, the arrow means that the lock in P0
> >   was scheduled before the lock in P1.
> >
> > is not good enough. It should read:
> >
> >   Within passive target communication, two locks L1 and L2
> >   may be scheduled "L1 released before L2 granted" or
> >   "L2 released before L1 granted" in the case
> >   of two locks with at least one exclusive, or in the case of any
> >   lock with an
> >   additional synchronization (e.g., with point-to-point or
> >   collective communication) in between. In the
> >   following patterns, the arrow means that the lock in P0
> >   was released before the lock in P1 was granted, independent
> >   of the method used to achieve this schedule.
> >
> > In the patters itself, I'll remove the word "shared" and
> > "exclusive".
> >
> > A nice examlple would also be
> >
> >    Process P0          Process P1
> >    A=0                 B=0
> >    MPI_Win_fence       MPI_Win_fence
> >    MPI_Win_lock        MPI_Win_lock
> >      exclusive           exclusive
> >    A=val_1             B=val_2
> >    Bnew=load(B)        Anew=load(A)
> >    MPI_Win_unlock      MPI_Win_unlock
> 
> I have not started on examples that mix sync modes but I can try to
> do that next week.

The fence was only to be sure that the initialization is 
visible on both processes.

> > The two rules write-read and read-write together
> > guarantee that either
> >  - in P0 Bnew=0     and in P1 Anew=val_1
> > or
> >  - in P0 Bnew=val_2 and in P1 Anew=0
> >
> > But this is a combination of several basic rules in #456.
> >
> > _________
> > In General:
> >
> > #456 tries to define these 3*4 + 3 = 15 patterns,
> > which should be enough as far as I oversee,
> > plus two examples, 11.13 to show the translation of
> > the win_sync write-read-pattern and read-write-pattern
> > in real software, and 11.14 the pairing of memory barriers
> > without further synchronization between the processes.
> >
> > Your test examples should be very helpful to test,
> > whether an MPI library fulfills these patterns
> > and to illustrate the compressed wording in #456.
> 
> Yeah, the tests are necessary but not sufficient to confirm
> implementation correctness.  So far I have only run them on x86
> laptop, which is probably the easiest case for which an
> implementation can be correct.

Yes, but helpful. I started the discussion after I detected
that my PSCW test-program does not work with mpich and Cray's MPI.

Later on I got the information that Hubert detected this
inconsistence of the MPI standard already two years ago.

Best regards
Rolf

> 
> Best,
> 
> Jeff
> 
> > Best regards
> > Rolf
> >
> > ----- Original Message -----
> >> From: "Jeff Hammond" <jeff.science at gmail.com>
> >> To: "MPI Forum" <mpiwg-rma at lists.mpi-forum.org>
> >> Sent: Friday, August 29, 2014 8:25:40 PM
> >> Subject: [mpiwg-rma] ticket 456
> >>
> >> Rolf,
> >>
> >> I find you pseudocode confusion, hence I am creating examples in C
> >> that I can compile and run.  I will try to come up with a case for
> >> everyone of the examples in your ticket.  See
> >> https://github.com/jeffhammond/HPCInfo/tree/master/mpi/rma/shared-memory-windows.
> >>
> >> Best,
> >>
> >> Jeff
> >>
> >>
> >> --
> >> Jeff Hammond
> >> jeff.science at gmail.com
> >> http://jeffhammond.github.io/
> >> _______________________________________________
> >> mpiwg-rma mailing list
> >> mpiwg-rma at lists.mpi-forum.org
> >> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> >>
> >
> > --
> > Dr. Rolf Rabenseifner . . . . . . . . . .. email
> > rabenseifner at hlrs.de
> > High Performance Computing Center (HLRS) . phone
> > ++49(0)711/685-65530
> > University of Stuttgart . . . . . . . . .. fax ++49(0)711 /
> > 685-65832
> > Head of Dpmt Parallel Computing . . .
> > www.hlrs.de/people/rabenseifner
> > Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room
> > 1.307)
> > _______________________________________________
> > mpiwg-rma mailing list
> > mpiwg-rma at lists.mpi-forum.org
> > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> 
> 
> 
> --
> Jeff Hammond
> jeff.science at gmail.com
> http://jeffhammond.github.io/
> _______________________________________________
> mpiwg-rma mailing list
> mpiwg-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> 

-- 
Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307)