[MPI3 Fortran] MPI non-blocking transfer

Mon Feb 9 16:57:33 CST 2009

I hope this mail may help to bring the Fortran and MPI sides
on this topic more together.

With Fortran, I see 3 major problems:
  A) The hidden buffer access in non-blocking routines
  B) VOID buffer declaration
  C) Call by reference and not by in-copy-out-copy

I split it off into two mails.
Here

A) The hidden buffer access in non-blocking routines
----------------------------------------------------

When I uderestand correctly, then the discussion on
   mpi3-fortran at lists.mpi-forum.org, sc22wg5 at open-std.org,
   and j3 at j3-fortran.org
could not solve the problems.

First I want to summarize the problem with 3 examples that
fully fit to the current MPI-2.1 standard rules:

A1) buf(1)=0
     CALL MPI_IRECV(buf, 1, MPI_REAL , imsg)
     ! doing something with without accessing the related part of buf, 
e.g.,
       buf(2) = 2
     CALL MPI_WAIT(imsg)
     CALL DD(buf)  or   CALL DD(buf(1))
     PRINT *, buf(1),buf(2)

     with separately compiled routine DD, i.e., no chance for
     the compiler, to see that DD is doing nothing.

     SUBROUTINE DD(buf)
     RETURN
     END

A2) SUBROUTINE xxx
      REAL, DIMENSION(2) :: buf
      buf(1)=0
      CALL myIRECV(buf, 1, MPI_REAL , imsg)
      ! doing something with without accessing the related part of 
buf, e.g.,
        buf(2) = 2
      CALL myWAIT(imsg)
      CALL DD(buf)  or   CALL DD(buf(1))
      PRINT *, buf(1),buf(2)
     END
     SOUBROUTINE myIRECV(buf, cnt, dt , imsg)
      CALL MPI_IRECV(buf, cnt, dt , imsg)
     END
     SOUBROUTINE myWAIT(imsg)
      CALL MPI_WAIT(imsg)
     END

     with separately compiled routine DD, i.e., no chance for
     the compiler, to see that DD is doing nothing.

     SUBROUTINE DD(buf)
     RETURN
     END

A3) SUBROUTINE xxx
      REAL, DIMENSION(2) :: buf
      buf(1)=0
      CALL myIRECV(buf, 1, MPI_REAL , imsg)
      ! doing something with without accessing the related part of 
buf, e.g.,
        buf(2) = 2
      CALL myDDWAIT(imsg, buf)
      PRINT *, buf(1),buf(2)
     END
     SOUBROUTINE myIRECV(buf, cnt, dt , imsg)
      CALL MPI_IRECV(buf, cnt, dt , imsg)
     END

     with separately compiled routine DD, i.e., no chance for
     the compiler, to see that DD is doing nothing.

     SOUBROUTINE myDDWAIT(imsg, buf)
      CALL MPI_WAIT(imsg)
     END

1. Questions are:
      1.1 Are the solutions with soubroutine DD and myDDWAIT correct?
      1.2 Can we do it better, especially without the performance-lost
          through the additional subroutine call to DD
          or through handling the additional argument buf in the call 
to
          myDDWAIT?

2. My goals in this discussion:
      2.1 Find a solution that answers 1.2 with YEs.
      2.2 For me, it is still okay, that the user must do something.
          The new solution need not to be easier, it should have
          better performance.
      2.3 The existing solution must continue to work, i.e.,
          all existing and correct MPI applications must continue
          to work.
      2.4 It is not my goal to automatically correct existing wrong
          MPI applications, i.e., applications without the DD trick
          or VOLATILE buf.
      2.5 No loss of performance in the rest of the application,
          especially any numerics with buf should be optimized
          in exactly the same way as today.

3. The idea with VOLATILE SUBROUTINE MPI_WAIT
      3.1 After all the discussion, it seems that it is hard to
          meet Goal 2.5
      3.2 In Example A1 and A2, the user can remove the CALL DD,
          But in Example A2, the user has to declare myWAIT as
          VOLATILE SUBR.
      3.3 Implication: The user has still to understand the problem
          and to act in a different (VOLATILE SUBROUTINE) way as in 
the
          past (CALL DD).
      3.4 How can we restrict the VOLATILE SUBROUTINE to only some
          variables, here "buf"?
          If there is a solution, then the user seems to be still
          involved, e.g., telling "buf".

4. My idea to this problem:

    Allow additional arguments that are "DUMMIES".
    DUMMY means, that in the call to the routine, it is used,
    but it never arrives in the called routine.

    4.1 SUBROUTINE MPI_WAIT_B(imsg, buf)
        VOID, DUMMY :: buf

    or
    4.2 SUBROUTINE MPI_WAIT_B(imsg, DUMMY)

    4.3 SUBROUTINE MPI_WAITALL_B(cnt,imsg, DUMMYLIST)

    DUMMY and DUMMYLIST are new keywords
    (which should be carefully chosen)

    DUMMYLIST is only allowed at the end of an argument list.
    With DUMMYLIST, no argument checking will occor on the additional
    arguments.
    Any number (including zero) of additional arguments at the end
    of the list is allowed.

    DUMMY on the argumentlist is identical with one "VOID,DUMMY"
    argument.

    If an argument checking on a dummy argument is wished, then, e.g.,
    REAL,DUMMY::buf  can be used.

    With 4.1 (but not with 4.2 or 4.3), an INTENT IN or OUT may
    be defined in the interface definition.

    Rules when calling a routine that has DUMMY arguments in the
    interface definition:

     4.a.  DUMMY arguments are not handed over to the called routine
     4.b.  The used DUMMY arguments are handled in the calling routine
           as if they were handed over to the called routine and as if
           the called routine has modified the content of the argument
           (This may be restricted through the use of INTENT).

    Rules for the body of the called routine:

     4.c   The DUMMY arguments are not accessible.
     4.d   The argument list consists only of the arguments that are 
not
           marked as DUMMY.
           Especially, when the called routine is written in C, this 
is
           important.

    Rules, if no interface definition is made
    (i.e. old F77 style is used):

     4.e   DUMMY arguments are handed over.
           This is necessary because the calling routine does know
           anything about the clled interface.

Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
Nobelstr. 19, D-70550 Stuttgart, Germany . (Office: Allmandring 30)