[mpiwg-rma] ticket 456

Sat Aug 30 09:49:46 CDT 2014

Jeff, 

great!

Several comments:

________
About your
https://github.com/jeffhammond/HPCInfo/blob/master/mpi/rma/shared-memory-windows/win_fence.c

line 48-50 (barrier + 2nd fence) are not needed.

fence on line 42 may be needed to guarantee that any initialization
is finished.

fence on line 52 seems to be also not needed:
 - not according to #456
 - not if substituting the store by a MPI_Put.
It would be only needed, if the load on line 51 would be a MPI_Get.

________
About your
https://github.com/jeffhammond/HPCInfo/blob/master/mpi/rma/shared-memory-windows/win_pscw.c

Line 17: I would recommend MPI_Abort
Lines 48, 50, 58, 59 are not needed.
Result:

46     if (rank==0) { 
49         *shptr = 42; 
52         MPI_Win_post(MPI_GROUP_ONE, 0, shwin); 
53         MPI_Win_wait(shwin); 
55     } else if (rank==1) { 
56         int lint; 
61         MPI_Win_start(MPI_GROUP_ZERO, 0, shwin); 
62         lint = *rptr; 
63         MPI_Win_complete(shwin); 

This example would illustrate the write-read-rule of #456
(i.e. pattern with variable A)
   A=val_1
   Sync-to-P1      --> Sync-from-P0
                       load(A)
with
   Sync-to-P1      --> Sync-from-P0
being
2. MPI_Win_post    --> MPI_Win_start

This example would also work when substituting line 62
by MPI_Get.

All Patterns in #456 are based on corresponding 
Patterns with MPI_Get and MPI_Put.

If this example does not work with an existing MPI libary,
then because it optimizes the synchronization away
because there are no RMA calls.

_________
About your
https://github.com/jeffhammond/HPCInfo/blob/master/mpi/rma/shared-memory-windows/win_sync.c

Perfect.
This example illustrates again the write-read-rule of #456
(i.e. pattern with variable A)
with
   Sync-to-P1      --> Sync-from-P0
being
4. MPI_Win_sync
   Any-process-sync-
    -from-P0-to-P1 --> Any-process-sync-
                        -from-P0-to-P1 4)
                       MPI_Win_sync

_________
About your 
https://github.com/jeffhammond/HPCInfo/blob/master/mpi/rma/shared-memory-windows/win_lock_exclusive.c

Perfect.
This example illustrates again the write-read-rule of #456
(i.e. pattern with variable A),
here the lock/unlock pattern with exclusive lock in P1.

_________
About your
https://github.com/jeffhammond/HPCInfo/blob/master/mpi/rma/shared-memory-windows/win_lock_shared.c

I would expect that it works,
but it is not based on a pattern that currently exists in #456.

This is a bug of #456.

The current wording

  Patterns with lock/unlock synchronization: 

  Within passive target communication, two locks L1 and L2 
  may be scheduled L1 before L2 or L2 before L1. In the 
  following patterns, the arrow means that the lock in P0 
  was scheduled before the lock in P1. 

is not good enough. It should read:

  Within passive target communication, two locks L1 and L2 
  may be scheduled "L1 released before L2 granted" or 
  "L2 released before L1 granted" in the case
  of two locks with at least one exclusive, or in the case of any lock with an
  additional synchronization (e.g., with point-to-point or
  collective communication) in between. In the 
  following patterns, the arrow means that the lock in P0 
  was released before the lock in P1 was granted, independent
  of the method used to achieve this schedule.

In the patters itself, I'll remove the word "shared" and "exclusive". 

A nice examlple would also be

   Process P0          Process P1
   A=0                 B=0
   MPI_Win_fence       MPI_Win_fence
   MPI_Win_lock        MPI_Win_lock 
     exclusive           exclusive
   A=val_1             B=val_2
   Bnew=load(B)        Anew=load(A)
   MPI_Win_unlock      MPI_Win_unlock 

The two rules write-read and read-write together
guarantee that either
 - in P0 Bnew=0     and in P1 Anew=val_1 
or
 - in P0 Bnew=val_2 and in P1 Anew=0

But this is a combination of several basic rules in #456.

_________
In General:

#456 tries to define these 3*4 + 3 = 15 patterns,
which should be enough as far as I oversee,
plus two examples, 11.13 to show the translation of 
the win_sync write-read-pattern and read-write-pattern
in real software, and 11.14 the pairing of memory barriers 
without further synchronization between the processes.

Your test examples should be very helpful to test, 
whether an MPI library fulfills these patterns
and to illustrate the compressed wording in #456.

Best regards
Rolf    

----- Original Message -----
> From: "Jeff Hammond" <jeff.science at gmail.com>
> To: "MPI Forum" <mpiwg-rma at lists.mpi-forum.org>
> Sent: Friday, August 29, 2014 8:25:40 PM
> Subject: [mpiwg-rma] ticket 456
> 
> Rolf,
> 
> I find you pseudocode confusion, hence I am creating examples in C
> that I can compile and run.  I will try to come up with a case for
> everyone of the examples in your ticket.  See
> https://github.com/jeffhammond/HPCInfo/tree/master/mpi/rma/shared-memory-windows.
> 
> Best,
> 
> Jeff
> 
> 
> --
> Jeff Hammond
> jeff.science at gmail.com
> http://jeffhammond.github.io/
> _______________________________________________
> mpiwg-rma mailing list
> mpiwg-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
> 

-- 
Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307)