[Mpi-forum] Discussion points from the MPI-<next> discussion today

Mon Sep 24 00:42:41 CDT 2012

Jed,

Your initial example code for the target is not quite correct/portably 
correct.  The target needs to perform a window synchronization before 
accessing the data exposed in the window.  Otherwise, the value read is 
not guaranteed.

#4 is one way to do this in MPI-2 and, as you point out, a couple more 
synchronization options were added in MPI-3.  We were careful in the 
unified memory model specification not to provide any consistency 
guarantees outside of what can be achieved using MPI routines (hence, 
the vague use of "eventually").

  ~Jim.

On 9/23/12 4:30 PM, Jed Brown wrote:
> On Sun, Sep 23, 2012 at 2:34 PM, N.M. Maclaren <nmm1 at cam.ac.uk
> <mailto:nmm1 at cam.ac.uk>> wrote:
>
>     MPI does not specify that.  Both Fortran and C have mechanisms that can
>     be used for inter-process synchronisation that do not involve
>     calling MPI,
>     and therefore will not call an MPI fence.  Writing to a file and reading
>     the data is one classic one, and is heavily used.  I have seen data take
>     5 seconds to get from one thread to another, which is ample time for
>     I/O,
>     and I have seen that logic cause this trouble with shared memory used by
>     other forms of RDMA and synchronisation using file I/O.  And, yes, the
>     RDMA did use a write fence.
>
>
> Obviously the read fence is the relevant issue here. Your example is now
> the following (cf. MPI-3 Example 11.9)
>
> origin:
> MPI_Win_create
> MPI_Win_lock
> MPI_Put
> MPI_Win_unlock
> notify(side_channel) // e.g., global variable in shared memory,
> memory-mapped serial line, file system
>
> target:
> double buffer[10] = {0};
> MPI_Win_create(buffer,....)
> wait(side_channel) // e.g., spin
> x = buffer[0]
>
> It would have saved us a great deal of time if you had written this 30
> messages ago, but in any case, we can make some observations.
>
> 1. If wait(side_channel) is a macro or inline function that the compiler
> can guarantee does not itself touch buffer, the compiler could reorder
> it with the read from buffer[]. This is the lack of sequence point that
> you were concerned with.
>
> 2. Even with a sequence point, some hardware (including POWER and SPARC)
> reorders independent loads, thus buffer[0] could be loaded before
> side_channel despite the instructions having the correct order.
>
> 3. Suppose there was a data dependency in the sense of
>
> double *ptr = wait(side_channel);
> x = ptr[0];
>
> This is still not guaranteed to be correct on Alpha, which reorders
> DEPENDENT loads. For more details, see DATA DEPENDENCY BARRIERS in
> http://www.kernel.org/doc/Documentation/memory-barriers.txt and Table 5
> and Figure 10 of
> http://www.rdrop.com/users/paulmck/scalability/paper/whymb.2010.06.14a.pdf.
>
> 4. Obvious fixes include (a) don't communicate through the side channel
> and (b) protect the access as in
>
> MPI_Win_lock
> x = buffer[0]
> MPI_Win_unlock
>
> Note that many applications will use this anyway because it's onerous
> for the application to ensure that all passive mode RMA operations have
> completed (in the sense of MPI_Win_unlock on the target returning).
>
> 5. MPI-3 is more explicit about the memory model, providing
> MPI_WIN_UNIFIED and MPI_WIN_SEPARATE. In the latter, the direct access
> (without MPI_Win_lock/unlock or other synchronization such as
> MPI_Win_sync) is invalid. Read MPI-3 page 454. I believe your complaint
> can be summed up by the sentence "In the RMA unified memory model, an
> update by a put or accumulate call to a public window copy eventually
> becomes visible in the private copy in process memory without additional
> RMA calls." In this sentence, "eventually" roughly means "until a read
> memory fence is issued by the target, perhaps as a side-effect of some
> unrelated call". Since "eventually" could be a long time, some
> side-channel notification could allow access before the result was
> visible to the target process. Fortunately, "eventually" is an ambiguous
> term. ;-)
>
> Rereading page 456, it could be more explicit about the possible
> requirement for user memory fences, especially since it could be
> necessary on some hardware independent of compiler optimization levels.
> Although the guidelines are somewhat loosely worded, the examples
> clarify. Note especially Example 11.9 which covers exactly the read
> ordering issue discussed here and Example 11.7 which deals with the
> converse.
>
>
>             As a specific example, Fortran compilers can and do move
>             arrays over
>             procedure calls that do not appear to use them; C ones do
>             not, but are
>             in theory allowed to.
>
>
>         Passive-mode RMA is only compliant for memory allocated using
>         MPI_Alloc_mem(). Since MPI_Alloc_mem() cannot be used portably
>         by Fortran,
>         passive-mode RMA is not portable for callers from vanilla Fortran.
>
>
>     That has been wrong since Fortran 2003, which provides C
>     interoperability,
>     including the ability to use buffers allocated in C.
>
>
> I was referring to the dialects of Fortran supported by the MPI standard
> prior to this week.
>
>     That is necessary but not sufficient, both in theory and practice.
>     But, yes, active one-sided is semantically comparable to non-blocking.
>
>     I am not going to be dragged into describing the signal handling fiasco,
>     but I have seen what you claim to be unused used in two compilers.
>
>
> When I find compiler bugs, I report them. Can you point to the ticket
> where this issue was reported? Surely _someone_ was annoyed that the
> compiler was incapable of producing correct code for any multithreaded
> kernel, libpthread, database, or web browser...
>
>     Indeed, one of them triggered me into trying (and failing) to get SOME
>     kind of semantics defined for volatile in WG14.
>
>
> Even with the current "specification", existing compilers are riddled
> with bugs related to volatile.
>
> http://www.cs.utah.edu/~regehr/papers/emsoft08-preprint.pdf
> http://blog.regehr.org/archives/503
>
> Worse, it's useless for what most people try to use it for.
>
> http://kernel.org/doc/Documentation/volatile-considered-harmful.txt
>
>
> _______________________________________________
> mpi-forum mailing list
> mpi-forum at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-forum
>