[MPI3 Fortran] [Interop-tr] [Mpi-forum] Comment on Fortran WG5 ballot N1846
rabenseifner at hlrs.de
Thu Apr 14 08:01:57 CDT 2011
[I expect that the mail is not delivered by interop-tr at j3-fortran.org
because I'm not a member of J3]
After all the discussion, it really looks best to
ignore the existence of the ASYNCHRONOUS attribute
when looking at the co-existence of Fortran nonblocking
Based on the discussion, I plan to add the following text
into the draft for MPI-3.0 and based on Option 1 in a
[Text about an application with parts of an array
involved in a nonblocking MPI operation and other
parts involved in computation, and a compiler optimization
with temporarily copying variables into a local memory
and afterward back to the original memory]
Note that this type of compiler optimization can
not be prevented when buf is declared with the
ASYNCHRONOUS Fortran attribute, and it should not
be prevented by declaring buf as VOLATILE because:
- A Fortran asynchronous input/output operation on a part
of an array variable implies that the access to the whole
array (the so-called pending I/O storage sequence affector,
see Section 22.214.171.124 in [Fortran 2008]) is restricted
as if the whole array is involved in the pending I/O
(see Section 126.96.36.199, paragraphs 5 and 6 in [Fortran 2008]).
Therefore, the ASYNCHRONOUS cannot be used if parts of the
array is involved in a asynchronous operation and other
parts are used for computation.
- In principle, the Fortran standard restricts the scope
of the ASYNCHRONOUS attribute only to Fortran asynchronous
input and output operations.
- A Fortran compiler may implement the Fortran asynchronous
input/output operations as blocking I/O. In this case,
the compiler can fully ignore the ASYNCHRONOUS attribute.
- The VOLATILE implies that all accesses to any storage unit (word)
of buf must be directly done in the main memory exactly in
the sequence defined by the application program. The attribute
VOLATILE prevents every register or cache optimization.
- Therefore, VOLATILE may cause a huge performance degradation.
Instead of solving the problem, it is better to prevent the problem,
i.e., when overlapping communication and computation, the communication
(or nonblocking IO) and the computation should be executed on different
sets of variables. In this case, the temporary memory modifications
are done only on the variables used in the computation and cannot
have any side effect on the data used in the nonblocking MPI operations.
The first three items in the list suggest that it is a bad idea
to enlarge the scope of ASYNCHRONOUS to nonblocking MPI.
----- Original Message -----
> From: "N.M. Maclaren" <nmm1 at cam.ac.uk>
> To: "John Reid" <John.Reid at stfc.ac.uk>, interop-tr at j3-fortran.org
> Cc: interop-tr at j3-fortran.org, "Rolf Rabenseifner" <rabenseifner at hlrs.de>
> Sent: Thursday, April 14, 2011 11:33:32 AM
> Subject: Re: [Interop-tr] [Mpi-forum] Comment on Fortran WG5 ballot N1846
> On Apr 14 2011, John Reid wrote:
> >If the comment in the middle is changed to
> > ! asynchronous I/O on ptr variable
> > the program would be non-conforming because A would be an affector
> > without the ASYNCHRONOUS attribute.
> >Perhaps we need normative text that says (somehow) that asynchronous
> >communication is regarded as if it were asynchronous i/o.
> In which case my reading of the standard is wrong. I have just looked
> again, and think that either I probably was or the wording is more
> confusing than it should be.
> I don't think that Fortran intends to forbid:
> PROGRAM Main
> REAL :: array(100)
> CALL Fred(array(:50),array(51:))
> END PROGRAM Main
> SUBROUTINE Fred (buffer, scratch)
> REAL, ASYNCHRONOUS :: buffer(:)
> REAL :: scratch(:)
> CALL MPI_Irecv(buffer,...)
> CALL DGEMM(scratch,...)
> CALL MPI_Wait(...)
> END SUBROUTINE Fred
> As you say, the same problem can be shown using Fortran asynchronous
> However, this issue is not limited to ASYNCHRONOUS and affects
> that have the TARGET attribute, too. I assert that it is conceptually
> identical to the one where TARGET is used, MPI_Irecv sets up a pointer
> and DGEMM uses that pointer. I.e. it's a question of exactly when
> association stops.
> Nick Maclaren.
Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
Nobelstr. 19, D-70550 Stuttgart, Germany . (Office: Allmandring 30)
More information about the mpiwg-fortran