<div class="gmail_quote">Thanks for being more specific.</div><div class="gmail_quote"><br></div><div class="gmail_quote">On Sun, Sep 23, 2012 at 11:23 AM, N.M. Maclaren <span dir="ltr"><<a href="mailto:nmm1@cam.ac.uk" target="_blank">nmm1@cam.ac.uk</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div id=":bhk">Because there is no barrier required in the user's code for passive<br>

one-sided communication (though there is for active), and ALL relevant<br>

specifications require both sides to call SOME kind of operation to do<br>

the handshaking.  NONE have one-sided barriers, often not even that the<br>

hardware level.<br></div></blockquote><div><br></div><div>Absolutely, such "one-sided memory barriers" don't even make semantic sense. Fortunately, the target process in not allowed to access the contents of the target window willy-nilly. When MPI_Win_unlock returns, the source process knows that any changes during that exposure epoch have completed on the target (including any necessary write memory fences), but the target process does not know this yet and may not have issued the necessary read memory fence. Either the application provides some other mechanism to avoid the race condition (e.g., a subsequent MPI collective, a point-to-point message with data dependency, completion of an MPI_Issend) or the target process accesses its own buffer using MPI_Win_lock. Either way, the target process is not allowed to access the contents of the target window until calling through MPI in a context that can supply the required read memory fence.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div id=":bhk">

As a specific example, Fortran compilers can and do move arrays over<br>

procedure calls that do not appear to use them; C ones do not, but are<br>

in theory allowed to.<br></div></blockquote><div><br></div><div>Passive-mode RMA is only compliant for memory allocated using MPI_Alloc_mem(). Since MPI_Alloc_mem() cannot be used portably by Fortran, passive-mode RMA is not portable for callers from vanilla Fortran. MPI-2 explicitly acknowledged this limitation. In practice, the necessary extension is available on enough systems that it doesn't matter (we still complain about incompatible conventions, but ranting about Fortran is not the my purpose).</div>

<div><br></div><div>Outside of passive-mode RMA, the situation is the same as asynchronous progress for MPI_Irecv in that stable addresses are required. If I interpret your statement above correctly, the concern is that the compiler is allowed to rewrite</div>

<div><br>double buffer[10];</div><div>MPI_Irecv(buffer,...,&request);</div><div>// some arithmetic not using buffer</div><div>MPI_Wait(request,&status);</div><div><br></div><div>to</div><div><br></div><div>double buffer[10], tmp[10];</div>

<div>MPI_Irecv(buffer,...,&request);</div><div>memcpy(tmp,buffer,sizeof buffer);</div><div>// some arithmetic using buffer</div><div>memcpy(buffer,tmp,sizeof buffer);</div><div>MPI_Wait(request,&status);</div><div>

<br></div><div>Since a signal handler is only allowed to access volatile sig_atomic_t (which buffer cannot be aliased under), the rewrite does not affect sequential semantics, but it would quite clearly break MPI_Irecv. By convention, compilers do not perform such munging of non-volatile stack because it would break every threaded program ever written. Compilers have to be useful to be successful, therefore they have no incentive to exploit every loophole they can get away with while complying to the letter of the standard. Note that this is in stark contrast to politicians and patent lawyers. C11 and C++11 finally closed this (unexploited) loophole.</div>

<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div id=":bhk">

That problem can be avoided in both languages, but the complications are<br>

so foul that I am not prepared to teach it.  In particular, they involve<br>

arcane programming restrictions on Fortran argument passing and several<br>

widely-used aspects of C, plus subtle use of some of attributes like<br>

TARGET, ASYNCHRONOUS and volatile.  And, heaven help me, at least most<br>

of them really are needed on at least some systems.  The restrictions of<br>

were not introduced just for the hell of it, but because at least some<br>

vendors required them - including effective types :-(<br>

<br>

And, as I said, the issues with MPI_Ireduce and MPI_Irecv_reduce are far subtler, and are NOT soluble within the language or for all systems.</div></blockquote></div><br>