[MPI3 Fortran] [Interop-tr] [Mpi-forum] Comment on Fortran WG5 ballot N1846

Fri Apr 15 11:18:38 CDT 2011

[John, it looks like that it may be helpful when
 putting me on the interop-tr at j3-fortran.org] 

Okay, I expect that I now understand:
 - You talk about permanent data movements (including 
   automatic correction of all Fortran pointers to such data,
   e.g., for a garbage collection)
   and that this is a problem with C based MPI and
   C-pointers pointing to buffers of nonblocking MPI 
   operations.
   This problem seems to be unsovled at all.

   How can it really solved?

   What is the VILE attribute. I did found it anything about it.

   Is there a combination of MPI+Fortran where
   the Fortran compiler makes garbage collection
   and how is it solved there?  

 - In the past we talked about temporary data movement
   (e.g., into a GPU) while nonblocking MPI operations
   are pending; this was solved by not using the
   buffers (of such nonblocking MPI operations)
   in some numerical code.

 - And I talked in my previous email about 
   code movements (not data movements) across MPI_Wait calls.
   This and only this problem seems to be solved 
   with the methods I showed.

Do I understand it correctly?

Best regards
Rolf

----- Original Message -----
> From: "N.M. Maclaren" <nmm1 at cam.ac.uk>
> To: "Rolf Rabenseifner" <rabenseifner at hlrs.de>
> Cc: "John Reid" <John.Reid at stfc.ac.uk>, "MPI-3 Fortran working group" <mpi3-fortran at lists.mpi-forum.org>,
> interop-tr at j3-fortran.org, "Reinhold Bader" <Reinhold.Bader at lrz-muenchen.de>, "Craig Rasmussen" <rasmussn at lanl.gov>,
> "Bill Long" <longb at cray.com>, "Jeff Squyres" <jsquyres at cisco.com>
> Sent: Friday, April 15, 2011 12:12:54 PM
> Subject: Re: [Interop-tr] [Mpi-forum] Comment on Fortran WG5 ballot N1846
> On Apr 14 2011, Rolf Rabenseifner wrote:
> >
> >> > - TARGET buf
> >>
> >> Why should that work? Fortran compilers are required to take
> >> notice of it only when also operating with POINTER variables. And
> >> there is nothing stopping them from moving such data dynamically,
> >> if they update all associated pointers. That's old technology,
> >> after all.
> >
> >With
> >TARGET buf
> >CALL MPI_Irecv(buf, rq)
> >CALL MPI_Wait(rq)
> >xnew=buf
> 
> I have omitted your discussion because it's separate from my point.
> 
> TARGET will stop code movement as well as ASYNCHRONOUS in a C-like
> implementation, but also implies a lot more, and so is a LOT less
> efficient. That's not good, for a start.
> 
> My main point was that, in a compiler with a compacting garbage
> collector, it allows that to move targets at any time provided that it
> also updates its pointers, and that is well-established technology.
> C and C++ have implicitly stated that compacting garbage collectors
> are not allowed in conforming implementations; Fortran has not and I,
> for one, would oppose any move to do so.
> 
> 
> >> > - Using a call to a dummy routine MPI_F_SYNC_REG(buf)
> >> >   immediately after MPI_Wait
> >
> >> > - Storing buf as a module variable or in a common block
> >
> >Therefore the problem is solved.
> >
> >Correct?
> >
> >If not, please can you show a detailed code, how
> >such a compiled code could work and in xnew is at the end
> >***not*** the value that was stored in buf by MPI_Wait.
> 
> Both of the above fail as soon as those are passed through
> intermediate subroutines, as is commonly coded:
> 
> PROGRAM Main
> REAL :: buffer(100)
> CALL Fred(buffer)
> CALL Joe(buffer)
> END PROGRAM Main
> SUBROUTINE Fred (arg)
> REAL :: arg(:)
> CALL MPI_Isend(arg,...)
> END SUBROUTINE Fred
> SUBROUTINE Joe (arg)
> REAL :: arg(:)
> CALL MPI_Wait(...)
> CALL MPI_F_SYNC_REG(arg)
> END SUBROUTINE Fred
> 
> This example also addresses Bill's and Van's proposed hacks, which
> won't work with this, either.
> 
> 
> >> >> - (VOALTILE buf, not recommended)
> >>
> >> Why should that work any better than ASYNCHRONOUS? All it says (in
> >> both
> >> C and Fortran, incidentally) is "All bets are off - consult your
> >> vendor
> >> documentation for what might happen." ASYNCHRONOUS has the right
> >> semantics.
> >
> >With
> >VOLATILE buf
> >CALL MPI_Irecv(buf, rq)
> >CALL MPI_Wait(rq)
> >xnew=buf
> >
> >By definition of volatile, the compiler has to guarantee
> >that all accesses to buf are done through the memory
> >and in the sequence of the application.
> 
> Er, no. Sorry. That is a common myth, but is not so in any of C, C++
> or Fortran. Let's skip the ungodly mess that volatile is in C, and
> consider just Fortran.
> 
> > Or have I misunderstood the wording of VOLATILE: Fortran 2008,
> > 5.3.19:
> > "The VOLATILE attribute specifies that an object may be referenced,
> > defined, or become undefined, by means not specified by the
> > program." Our
> > MPI_Wait is such a "by means not specified by the program."
> 
> You have misunderstood. It is entirely up to the processor what
> constitutes "by means not specified by the program". Regrettably,
> unlike C (which does THIS aspect better than Fortran), it is not
> processor-dependent and therefore need not be documented.
> 
> Until such time as an interpretation decides otherwise, a compiler can
> perfectly take the approach that interaction with asynchronous agents
> using VOLATILE (and hence example C.2.3) does not take place in 'real
> time' but only when an explicit synchronisation function is called.
> POSIX and other designs use that model, after all.
> 
> Indeed, some compilers on some systems cannot practically implement
> example C.2.3, without turning every access to a VOLATILE object into
> a
> heavyweight, doubly synchronising subroutine call, and there is
> nothing
> in the normative wording that requires a compiler to do that. They
> might decide not to do so, and would still conform.
> 
> > Fortran 2008, NOTE 5.25: "The Fortran processor should use the most
> > recent definition of a volatile object when a value is required.
> > Likewise, it should make the most recent Fortran definition
> > available. It
> > is the programmer's responsibility to manage any interaction with
> > non-Fortran processes."
> >
> >"recent" means, that the sequence must not be modified.
> >Otherwise "xnew=buf" does not access the recent definition of buf.
> >
> >Correct?
> 
> That is non-normative. But, even ignoring that, it says that the
> programmer must get the constraints right, and doesn't say what they
> are. All it says is that it's the programmer's responsibility to obey
> them (which implies finding out what they are!)
> 
> 
> >If I should be on the list, then it is okay for me.
> 
> That would be good.
> 
> 
> >On Apr 14 2011, Van Snyder wrote:
> >
> >Of course, if the actual argument BUF is already a TYPE(*)
> >DIMENSION(..)
> >dummy argument, or even an assumed-size array, the compiler can't use
> >copy-in/copy-out.
> 
> Don't bet on it :-)
> 
> Caller-copying is a well-known compilation technique, for a start,
> and there are other ways it can happen.
> 
> >I don't quite understand how ASYNCHRONOUS knows when to stop
> >preventing
> >code motion. Does it prevent motion of variables with the attribute
> >across all calls throughout a scoping unit where they have it, or
> >does
> >it just prevent code motion across calls in which the asynchronous
> >variable is an actual argument? If the former, then the call to
> >MPI_F_SYNC_REG isn't necessary. If you want to restrict the range
> >where
> >ASYNCHRONOUS applies, use a BLOCK construct.
> 
> Basically, it says the compiler must allow for the array being
> accessed
> at any point, whether or not it is visible in the currently executing
> procedure. How it does it is its business!
> 
> I accept that the standard currently implies that is required only for
> Fortran asynchronous I/O, but it IS the mechanism designed for this
> purpose and VASTLY the simplest and cleanest solution is to use it for
> MPI as well. Yes, we could have ASYNCHRONOUS, C_ASYNCHRONOUS,
> THREAD_ASYNCHRONOUS and SIGNAL_ASYNCHRONOUS, but that would be getting
> silly ....
> 
> I am certainly not denying that clarification would be useful, and
> will
> try to think of a suitable proposal for the June WG5. While the
> wording
> is tricky, I doubt that it is extensive and might be solved by adding
> a
> new processor-dependency - i.e. whether the compiler supports
> asynchronous I/O in a companion processor. Or in several other ways.
> 
> 
> Regards,
> Nick Maclaren.

-- 
Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
Nobelstr. 19, D-70550 Stuttgart, Germany . (Office: Allmandring 30)