[MPI3 Fortran] [Fwd: Library-based ASYNCHRONOUS I/O and SYNC MEMORY]

Mon Sep 8 14:09:53 CDT 2008

[Just to introduce myself to those who don't know me, I am a member of  
the Sun Fortran compiler development team.  I meant to join this email  
alias a while ago, but only just got around to it.]

On Sep 7, 2008, at 6:24 AM, Hubert Ritzdorf wrote:

> Hi,
>
> also I think that a begin and end section is required and the  
> compiler needs
> full control. One possibility would be:
>
> !cdir noopt_begin{buffer1, buffer2, buffer3)
> ...
> !cdir noopt_end (buffer1)
> ...
> !cdir noopt_end (buffer1, buffer2)
>
> which simply turns off any optimization for the buffers
> "buffer1, buffer2, buffer3" within the block defined by
> "begin" and "end" such as the Fortran compiler knows
> the buffers "buffer1, buffer2, buffer3". The user program
> is responsible to provide enough Fortran internal info
> on the buffers.

Why is that necessary?  Which optimizations do you think need to be  
disabled?

So far as MPI asynchronous send/receive are concerned, I don't see any  
reason to disable any optimizations aside from those that move access  
to the buffer across a call to MPI_ISEND/MPI_IRECV/MPI_WAIT.  What you  
are proposing is going to have a much bigger impact on performance.

On Sep 8, 2008, at 1:55 PM, Dan Nagle wrote:

>
> The greatest difficulty I see is to link the asynchronous-ness
> to the call *and then to the wait* affecting only the portion
> of the caller's name actually needed.
>
> My main concern is with things declared like buffer( ..., 2),
> where buffer( ..., 1) and buffer( ..., 2) are used as double buffers
> where work and transmission alternate.  I think this
> case is rather common, but I haven't made a survey.

This is a case where I suspect a blunt hammer will do just as well.   
For example, suppose we had a way to inhibit all code motion across a  
call to MPI_WAIT.  That could inhibit an optimization that the  
compiler could do if it knew more about what MPI_WAIT was doing, but I  
suspect you would be hard pressed to come up with an example where it  
mattered.

In the particular case of these double buffers, I think you're talking  
about something like this:

01:  do i=1,n
02:    mpi_isend(buffer(..., 1), ..., b1)
03:    mpi_wait(b2)
04:    do something with buffer(..., 2)
05:    mpi_isend(buffer(..., 2), ..., b2)
06:    mpi_wait(b1)
07:    do something with buffer(..., 1)
08:  end do

You can only access buffer(...,2) at line 4, because at the other  
lines it is "in flight".  Therefore, there's no problem with  
inhibiting code motion for buffer(...,2) across the mpi_wait call on  
line 6.

Likewise, you can only access buffer(...,1) at line 7, so there's no  
problem with inhibiting code motion for it across line 3.

Can you come up with an example of this dual buffer usage where  
performance would actually require the kind of precise control you're  
talking about?

Iain