[Mpi-forum] Comment on Fortran WG5 ballot N1846

Graham, Richard L. rlgraham at ornl.gov
Wed Mar 30 11:51:40 CDT 2011


Dear member of the WG5 and J3 Fortran standardization committee,

First, we want to thank you for your effort in defining the TR 29113
as an important key to solve the problems between MPI and Fortran.

This is a comment from the MPI Forum on your WG5 letter ballot on
N1845 in ftp://ftp.nag.co.uk/sc22wg5/N1801-N1850/N1846.txt and
http://www.nag.co.uk/sc22wg5/.

After reading your N1845 working draft from March 3, 2011, we detected
one major gap: there is no standardized possibility of using
nonblocking features of MPI in Fortran programs to overlap computation
with communication.  Specifically, Fortran MPI applications will be
prohibited from initiating any form of asynchronous communication and
then continuing to perform local computations while MPI advances the
communication "in the background."

This is a key issue for us because nonblocking communication is a
critical technique for deadlock-free communication and high
performance performance.  An easy use case to describe is one that is
popular in current high performance computing environments: when the
MPI library utilizes hardware offload technologies (such as RDMA) with
a communication co-processor (such as a NIC).

The problem is, in principle, independent of MPI.  It also occurs with
libc/POSIX asynchronous IO (aio), and is therefore a topic that should
be solved by the "TR on further interoperability of Fortran with C."

We noticed that there is already a decision about this issue: to
extend the meaning of the ASYNCHRONOUS attribute rather than to define
a new attribute.  http://www.j3-fortran.org/doc/year/09/09-235r2.txt
contains the minutes from the J3 meeting, May 4-8, 2009.  On this
topic, the following vote was recorded:

  Paper 09-231 "Answer to MPI Forum regarding MPI asynchronous
  operations" [Rasmussen] discussed how to prevent code-motion and
  copy-in/out:

     SV: extend ASYNCHRONOUS - invent new attribute - undefined: 8-2-3

First, we would like to propose how to solve the problem, then we will
present a short example to illustrate the specific problem.

We recommend the following change of the first two paragraphs in this
section of the Fortran 2008 standard:

5.3.4 ASYNCHRONOUS attribute

1 An entity with the ASYNCHRONOUS attribute is a variable that may be
   subject to asynchronous input/output or other asynchronous access
   by means other than Fortran.

   Asynchronous input/output can be any asynchronous access to the
   variable or parts of it, potentially within the scope of this
   standard by Fortran asynchronous I/O or possibly via methods
   outside the scope of this standard, such as the libc/POSIX
   asynchronous IO (aio) or nonblocking message passing, one-sided
   communication, or nonblocking parallel I/O as part of the Message
   Passing Interface (MPI) standard.

2 The base object of a variable shall have the ASYNCHRONOUS attribute
   in a scoping unit if
   - the variable appears in an executable statement or specification
     expression in that scoping unit and
   - any statement of the scoping unit is executed while the variable
     is a pending I/O storage sequence affector (9.6.2.5)

   or any other pending asynchronous access by means other than
   Fortran.

This should be treated only as a wording proposal.  We provide the
above text only as a suggestion; J3/WG5 are certainly free to adjust
it as they feel necessary.  The key idea is that we need ASYNCHRONOUS
to also include potential memory modifications from agents outside the
scope of the Fortran standard.

The following snipit is a correct code that allows the MPI library to
internally use DMA, communications offload (e.g., to a NIC), or other
method to asynchronously communicate parts of the array while the user
application/main CPU simultaneously reads and writes to ***another***
part of the same array.  Specifically, this code snipit is is free of
race conditions.

USE mpi_f08
REAL, ASYNCHRONOUS :: buf(100,100)
! communicating a boundary of the array:
CALL MPI_Irecv(buf(1,1:100),...req,...)
DO j=1,100
   DO i=2,100
     buf(i,j)=.... ! work only on the inner area
   END DO
END DO
CALL MPI_Wait(req,...)

It is important that the compiler is ***not*** allowed to translate
this program (by using temporary memory modifications) into something
like this:

USE mpi_f08
REAL, ASYNCHRONOUS :: buf(100,100), buf_1dim(10000)
EQUIVALENCE (buf(1,1), buf_1dim(1))
CALL MPI_Irecv(buf(1,1:100),...req,...)
tmp(1:100)=buf(1,1:100) ! saving the boundary
DO k=1,10000
   buf_1dim(k)=...       ! work on the whole array
END DO
buf(1,1:100)=tmp(1:100) ! restoring the boundary
CALL MPI_Wait(req,...)

While the MPI library receives buf(1,1:100) "in the background", the
"work" part overwrites this part of the array as part of a numerical
optimization to achieve one long loop instead of a 2-loop-nesting.

With the current status of Fortran 2008 + TR 29113 (Version N1845), a
compiler can ignore the ASYNCHRONOUS attribute.  For example, a
compiler may choose to not implement Fortran asynchronous I/O in an
asynchronous way, meaning that there is no need for the compiler to
make any restrictions on optimization.

To be clear: with the current Fortran 2008, this optimization is still
allowed.  With our proposal, this optimization will be prohibited.

We want to mention that the use of VOLATILE would solve the problem,
but at a high performance price because it would also prohibit any
optimization of the "work" part (e.g., automatic cache optimization,
instruction reordering, etc.).  Especially since Fortran is used with
MPI in high-performance, numerically intensive applications, we feel
that VOLATILE cannot be recommended as a solution.

In the era of multi-core hardware and vector instructions on each CPU,
nonblocking communication must be allowed to overlap with efficient
numerical code written in Fortran in order to achieve scalable,
highly-performance applications.

Your vote in J3 gave us the confidence that you can solve this
problem within your TR 29113.

In short, if we were members of WG5, we would answer the ballot with

  "Yes, but I recommend the following changes..."

with the proposal described above.

We appreciate the fact that you have asked for external comments on
N1845.  We hope that our comments above are sufficient information,
and also hope that this letter conveys the critical importance with
which we consider this issue.

Many thanks for your time.





More information about the mpi-forum mailing list