[mpiwg-rma] ticket 456

Jeff Hammond jeff.science at gmail.com
Sun Aug 31 00:42:51 CDT 2014


On Sat, Aug 30, 2014 at 7:49 AM, Rolf Rabenseifner <rabenseifner at hlrs.de> wrote:
> Jeff,
>
> great!
>
> Several comments:
>
> ________
> About your
> https://github.com/jeffhammond/HPCInfo/blob/master/mpi/rma/shared-memory-windows/win_fence.c
>
> line 48-50 (barrier + 2nd fence) are not needed.

That's why the Barrier is commented out :-)

The extra Fence is there because I like to be pedantic and do not
assume that it is always visible to the programmer that a prior Fence
call has been made.

> fence on line 42 may be needed to guarantee that any initialization
> is finished.

Line 46 completes the initialization epoch.

> fence on line 52 seems to be also not needed:
>  - not according to #456
>  - not if substituting the store by a MPI_Put.
> It would be only needed, if the load on line 51 would be a MPI_Get.

This is why I hate Win_fence.  It was poorly designed.  It has no
beginning or end.  It would have been so much better to have
Fence_begin, Fence_end and Fence_middle (or whatever it could have
been called - it has the same notion of Win_sync of ending and
beginning an epoch).

> ________
> About your
> https://github.com/jeffhammond/HPCInfo/blob/master/mpi/rma/shared-memory-windows/win_pscw.c
>
> Line 17: I would recommend MPI_Abort
> Lines 48, 50, 58, 59 are not needed.
> Result:
>
> 46     if (rank==0) {
> 49         *shptr = 42;
> 52         MPI_Win_post(MPI_GROUP_ONE, 0, shwin);
> 53         MPI_Win_wait(shwin);
> 55     } else if (rank==1) {
> 56         int lint;
> 61         MPI_Win_start(MPI_GROUP_ZERO, 0, shwin);
> 62         lint = *rptr;
> 63         MPI_Win_complete(shwin);

You may be right.  I never use PSCW and it was hard enough to come up
with that example as it was.

> This example would illustrate the write-read-rule of #456
> (i.e. pattern with variable A)
>    A=val_1
>    Sync-to-P1      --> Sync-from-P0
>                        load(A)
> with
>    Sync-to-P1      --> Sync-from-P0
> being
> 2. MPI_Win_post    --> MPI_Win_start
>
> This example would also work when substituting line 62
> by MPI_Get.
>
> All Patterns in #456 are based on corresponding
> Patterns with MPI_Get and MPI_Put.
>
> If this example does not work with an existing MPI libary,
> then because it optimizes the synchronization away
> because there are no RMA calls.

I explicitly verified that MPICH has the memory barriers required to
make this example work.  I leave it as an exercise to the reader to
examine other implementations :-)

> _________
> About your
> https://github.com/jeffhammond/HPCInfo/blob/master/mpi/rma/shared-memory-windows/win_sync.c
>
> Perfect.
> This example illustrates again the write-read-rule of #456
> (i.e. pattern with variable A)
> with
>    Sync-to-P1      --> Sync-from-P0
> being
> 4. MPI_Win_sync
>    Any-process-sync-
>     -from-P0-to-P1 --> Any-process-sync-
>                         -from-P0-to-P1 4)
>                        MPI_Win_sync
>
> _________
> About your
> https://github.com/jeffhammond/HPCInfo/blob/master/mpi/rma/shared-memory-windows/win_lock_exclusive.c
>
> Perfect.
> This example illustrates again the write-read-rule of #456
> (i.e. pattern with variable A),
> here the lock/unlock pattern with exclusive lock in P1.
>
> _________
> About your
> https://github.com/jeffhammond/HPCInfo/blob/master/mpi/rma/shared-memory-windows/win_lock_shared.c
>
> I would expect that it works,
> but it is not based on a pattern that currently exists in #456.
>
> This is a bug of #456.
>
> The current wording
>
>   Patterns with lock/unlock synchronization:
>
>   Within passive target communication, two locks L1 and L2
>   may be scheduled L1 before L2 or L2 before L1. In the
>   following patterns, the arrow means that the lock in P0
>   was scheduled before the lock in P1.
>
> is not good enough. It should read:
>
>   Within passive target communication, two locks L1 and L2
>   may be scheduled "L1 released before L2 granted" or
>   "L2 released before L1 granted" in the case
>   of two locks with at least one exclusive, or in the case of any lock with an
>   additional synchronization (e.g., with point-to-point or
>   collective communication) in between. In the
>   following patterns, the arrow means that the lock in P0
>   was released before the lock in P1 was granted, independent
>   of the method used to achieve this schedule.
>
> In the patters itself, I'll remove the word "shared" and "exclusive".
>
> A nice examlple would also be
>
>    Process P0          Process P1
>    A=0                 B=0
>    MPI_Win_fence       MPI_Win_fence
>    MPI_Win_lock        MPI_Win_lock
>      exclusive           exclusive
>    A=val_1             B=val_2
>    Bnew=load(B)        Anew=load(A)
>    MPI_Win_unlock      MPI_Win_unlock

I have not started on examples that mix sync modes but I can try to do
that next week.

> The two rules write-read and read-write together
> guarantee that either
>  - in P0 Bnew=0     and in P1 Anew=val_1
> or
>  - in P0 Bnew=val_2 and in P1 Anew=0
>
> But this is a combination of several basic rules in #456.
>
> _________
> In General:
>
> #456 tries to define these 3*4 + 3 = 15 patterns,
> which should be enough as far as I oversee,
> plus two examples, 11.13 to show the translation of
> the win_sync write-read-pattern and read-write-pattern
> in real software, and 11.14 the pairing of memory barriers
> without further synchronization between the processes.
>
> Your test examples should be very helpful to test,
> whether an MPI library fulfills these patterns
> and to illustrate the compressed wording in #456.

Yeah, the tests are necessary but not sufficient to confirm
implementation correctness.  So far I have only run them on x86
laptop, which is probably the easiest case for which an implementation
can be correct.

Best,

Jeff

> Best regards
> Rolf
>
> ----- Original Message -----
>> From: "Jeff Hammond" <jeff.science at gmail.com>
>> To: "MPI Forum" <mpiwg-rma at lists.mpi-forum.org>
>> Sent: Friday, August 29, 2014 8:25:40 PM
>> Subject: [mpiwg-rma] ticket 456
>>
>> Rolf,
>>
>> I find you pseudocode confusion, hence I am creating examples in C
>> that I can compile and run.  I will try to come up with a case for
>> everyone of the examples in your ticket.  See
>> https://github.com/jeffhammond/HPCInfo/tree/master/mpi/rma/shared-memory-windows.
>>
>> Best,
>>
>> Jeff
>>
>>
>> --
>> Jeff Hammond
>> jeff.science at gmail.com
>> http://jeffhammond.github.io/
>> _______________________________________________
>> mpiwg-rma mailing list
>> mpiwg-rma at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>
>
> --
> Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
> High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
> University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
> Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307)
> _______________________________________________
> mpiwg-rma mailing list
> mpiwg-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma



-- 
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/



More information about the mpiwg-rma mailing list