[mpiwg-rma] ticket 456
Jeff Hammond
jeff.science at gmail.com
Sun Aug 31 00:42:51 CDT 2014
On Sat, Aug 30, 2014 at 7:49 AM, Rolf Rabenseifner <rabenseifner at hlrs.de> wrote:
> Jeff,
>
> great!
>
> Several comments:
>
> ________
> About your
> https://github.com/jeffhammond/HPCInfo/blob/master/mpi/rma/shared-memory-windows/win_fence.c
>
> line 48-50 (barrier + 2nd fence) are not needed.
That's why the Barrier is commented out :-)
The extra Fence is there because I like to be pedantic and do not
assume that it is always visible to the programmer that a prior Fence
call has been made.
> fence on line 42 may be needed to guarantee that any initialization
> is finished.
Line 46 completes the initialization epoch.
> fence on line 52 seems to be also not needed:
> - not according to #456
> - not if substituting the store by a MPI_Put.
> It would be only needed, if the load on line 51 would be a MPI_Get.
This is why I hate Win_fence. It was poorly designed. It has no
beginning or end. It would have been so much better to have
Fence_begin, Fence_end and Fence_middle (or whatever it could have
been called - it has the same notion of Win_sync of ending and
beginning an epoch).
> ________
> About your
> https://github.com/jeffhammond/HPCInfo/blob/master/mpi/rma/shared-memory-windows/win_pscw.c
>
> Line 17: I would recommend MPI_Abort
> Lines 48, 50, 58, 59 are not needed.
> Result:
>
> 46 if (rank==0) {
> 49 *shptr = 42;
> 52 MPI_Win_post(MPI_GROUP_ONE, 0, shwin);
> 53 MPI_Win_wait(shwin);
> 55 } else if (rank==1) {
> 56 int lint;
> 61 MPI_Win_start(MPI_GROUP_ZERO, 0, shwin);
> 62 lint = *rptr;
> 63 MPI_Win_complete(shwin);
You may be right. I never use PSCW and it was hard enough to come up
with that example as it was.
> This example would illustrate the write-read-rule of #456
> (i.e. pattern with variable A)
> A=val_1
> Sync-to-P1 --> Sync-from-P0
> load(A)
> with
> Sync-to-P1 --> Sync-from-P0
> being
> 2. MPI_Win_post --> MPI_Win_start
>
> This example would also work when substituting line 62
> by MPI_Get.
>
> All Patterns in #456 are based on corresponding
> Patterns with MPI_Get and MPI_Put.
>
> If this example does not work with an existing MPI libary,
> then because it optimizes the synchronization away
> because there are no RMA calls.
I explicitly verified that MPICH has the memory barriers required to
make this example work. I leave it as an exercise to the reader to
examine other implementations :-)
> _________
> About your
> https://github.com/jeffhammond/HPCInfo/blob/master/mpi/rma/shared-memory-windows/win_sync.c
>
> Perfect.
> This example illustrates again the write-read-rule of #456
> (i.e. pattern with variable A)
> with
> Sync-to-P1 --> Sync-from-P0
> being
> 4. MPI_Win_sync
> Any-process-sync-
> -from-P0-to-P1 --> Any-process-sync-
> -from-P0-to-P1 4)
> MPI_Win_sync
>
> _________
> About your
> https://github.com/jeffhammond/HPCInfo/blob/master/mpi/rma/shared-memory-windows/win_lock_exclusive.c
>
> Perfect.
> This example illustrates again the write-read-rule of #456
> (i.e. pattern with variable A),
> here the lock/unlock pattern with exclusive lock in P1.
>
> _________
> About your
> https://github.com/jeffhammond/HPCInfo/blob/master/mpi/rma/shared-memory-windows/win_lock_shared.c
>
> I would expect that it works,
> but it is not based on a pattern that currently exists in #456.
>
> This is a bug of #456.
>
> The current wording
>
> Patterns with lock/unlock synchronization:
>
> Within passive target communication, two locks L1 and L2
> may be scheduled L1 before L2 or L2 before L1. In the
> following patterns, the arrow means that the lock in P0
> was scheduled before the lock in P1.
>
> is not good enough. It should read:
>
> Within passive target communication, two locks L1 and L2
> may be scheduled "L1 released before L2 granted" or
> "L2 released before L1 granted" in the case
> of two locks with at least one exclusive, or in the case of any lock with an
> additional synchronization (e.g., with point-to-point or
> collective communication) in between. In the
> following patterns, the arrow means that the lock in P0
> was released before the lock in P1 was granted, independent
> of the method used to achieve this schedule.
>
> In the patters itself, I'll remove the word "shared" and "exclusive".
>
> A nice examlple would also be
>
> Process P0 Process P1
> A=0 B=0
> MPI_Win_fence MPI_Win_fence
> MPI_Win_lock MPI_Win_lock
> exclusive exclusive
> A=val_1 B=val_2
> Bnew=load(B) Anew=load(A)
> MPI_Win_unlock MPI_Win_unlock
I have not started on examples that mix sync modes but I can try to do
that next week.
> The two rules write-read and read-write together
> guarantee that either
> - in P0 Bnew=0 and in P1 Anew=val_1
> or
> - in P0 Bnew=val_2 and in P1 Anew=0
>
> But this is a combination of several basic rules in #456.
>
> _________
> In General:
>
> #456 tries to define these 3*4 + 3 = 15 patterns,
> which should be enough as far as I oversee,
> plus two examples, 11.13 to show the translation of
> the win_sync write-read-pattern and read-write-pattern
> in real software, and 11.14 the pairing of memory barriers
> without further synchronization between the processes.
>
> Your test examples should be very helpful to test,
> whether an MPI library fulfills these patterns
> and to illustrate the compressed wording in #456.
Yeah, the tests are necessary but not sufficient to confirm
implementation correctness. So far I have only run them on x86
laptop, which is probably the easiest case for which an implementation
can be correct.
Best,
Jeff
> Best regards
> Rolf
>
> ----- Original Message -----
>> From: "Jeff Hammond" <jeff.science at gmail.com>
>> To: "MPI Forum" <mpiwg-rma at lists.mpi-forum.org>
>> Sent: Friday, August 29, 2014 8:25:40 PM
>> Subject: [mpiwg-rma] ticket 456
>>
>> Rolf,
>>
>> I find you pseudocode confusion, hence I am creating examples in C
>> that I can compile and run. I will try to come up with a case for
>> everyone of the examples in your ticket. See
>> https://github.com/jeffhammond/HPCInfo/tree/master/mpi/rma/shared-memory-windows.
>>
>> Best,
>>
>> Jeff
>>
>>
>> --
>> Jeff Hammond
>> jeff.science at gmail.com
>> http://jeffhammond.github.io/
>> _______________________________________________
>> mpiwg-rma mailing list
>> mpiwg-rma at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>
>
> --
> Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
> High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
> University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
> Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307)
> _______________________________________________
> mpiwg-rma mailing list
> mpiwg-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
--
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
More information about the mpiwg-rma
mailing list