[MPI3 Fortran] MPI non-blocking transfer
Rolf Rabenseifner
rabenseifner at hlrs.de
Mon Feb 9 16:57:33 CST 2009
I hope this mail may help to bring the Fortran and MPI sides
on this topic more together.
With Fortran, I see 3 major problems:
A) The hidden buffer access in non-blocking routines
B) VOID buffer declaration
C) Call by reference and not by in-copy-out-copy
I split it off into two mails.
Here
A) The hidden buffer access in non-blocking routines
----------------------------------------------------
When I uderestand correctly, then the discussion on
mpi3-fortran at lists.mpi-forum.org, sc22wg5 at open-std.org,
and j3 at j3-fortran.org
could not solve the problems.
First I want to summarize the problem with 3 examples that
fully fit to the current MPI-2.1 standard rules:
A1) buf(1)=0
CALL MPI_IRECV(buf, 1, MPI_REAL , imsg)
! doing something with without accessing the related part of buf,
e.g.,
buf(2) = 2
CALL MPI_WAIT(imsg)
CALL DD(buf) or CALL DD(buf(1))
PRINT *, buf(1),buf(2)
with separately compiled routine DD, i.e., no chance for
the compiler, to see that DD is doing nothing.
SUBROUTINE DD(buf)
RETURN
END
A2) SUBROUTINE xxx
REAL, DIMENSION(2) :: buf
buf(1)=0
CALL myIRECV(buf, 1, MPI_REAL , imsg)
! doing something with without accessing the related part of
buf, e.g.,
buf(2) = 2
CALL myWAIT(imsg)
CALL DD(buf) or CALL DD(buf(1))
PRINT *, buf(1),buf(2)
END
SOUBROUTINE myIRECV(buf, cnt, dt , imsg)
CALL MPI_IRECV(buf, cnt, dt , imsg)
END
SOUBROUTINE myWAIT(imsg)
CALL MPI_WAIT(imsg)
END
with separately compiled routine DD, i.e., no chance for
the compiler, to see that DD is doing nothing.
SUBROUTINE DD(buf)
RETURN
END
A3) SUBROUTINE xxx
REAL, DIMENSION(2) :: buf
buf(1)=0
CALL myIRECV(buf, 1, MPI_REAL , imsg)
! doing something with without accessing the related part of
buf, e.g.,
buf(2) = 2
CALL myDDWAIT(imsg, buf)
PRINT *, buf(1),buf(2)
END
SOUBROUTINE myIRECV(buf, cnt, dt , imsg)
CALL MPI_IRECV(buf, cnt, dt , imsg)
END
with separately compiled routine DD, i.e., no chance for
the compiler, to see that DD is doing nothing.
SOUBROUTINE myDDWAIT(imsg, buf)
CALL MPI_WAIT(imsg)
END
1. Questions are:
1.1 Are the solutions with soubroutine DD and myDDWAIT correct?
1.2 Can we do it better, especially without the performance-lost
through the additional subroutine call to DD
or through handling the additional argument buf in the call
to
myDDWAIT?
2. My goals in this discussion:
2.1 Find a solution that answers 1.2 with YEs.
2.2 For me, it is still okay, that the user must do something.
The new solution need not to be easier, it should have
better performance.
2.3 The existing solution must continue to work, i.e.,
all existing and correct MPI applications must continue
to work.
2.4 It is not my goal to automatically correct existing wrong
MPI applications, i.e., applications without the DD trick
or VOLATILE buf.
2.5 No loss of performance in the rest of the application,
especially any numerics with buf should be optimized
in exactly the same way as today.
3. The idea with VOLATILE SUBROUTINE MPI_WAIT
3.1 After all the discussion, it seems that it is hard to
meet Goal 2.5
3.2 In Example A1 and A2, the user can remove the CALL DD,
But in Example A2, the user has to declare myWAIT as
VOLATILE SUBR.
3.3 Implication: The user has still to understand the problem
and to act in a different (VOLATILE SUBROUTINE) way as in
the
past (CALL DD).
3.4 How can we restrict the VOLATILE SUBROUTINE to only some
variables, here "buf"?
If there is a solution, then the user seems to be still
involved, e.g., telling "buf".
4. My idea to this problem:
Allow additional arguments that are "DUMMIES".
DUMMY means, that in the call to the routine, it is used,
but it never arrives in the called routine.
4.1 SUBROUTINE MPI_WAIT_B(imsg, buf)
VOID, DUMMY :: buf
or
4.2 SUBROUTINE MPI_WAIT_B(imsg, DUMMY)
4.3 SUBROUTINE MPI_WAITALL_B(cnt,imsg, DUMMYLIST)
DUMMY and DUMMYLIST are new keywords
(which should be carefully chosen)
DUMMYLIST is only allowed at the end of an argument list.
With DUMMYLIST, no argument checking will occor on the additional
arguments.
Any number (including zero) of additional arguments at the end
of the list is allowed.
DUMMY on the argumentlist is identical with one "VOID,DUMMY"
argument.
If an argument checking on a dummy argument is wished, then, e.g.,
REAL,DUMMY::buf can be used.
With 4.1 (but not with 4.2 or 4.3), an INTENT IN or OUT may
be defined in the interface definition.
Rules when calling a routine that has DUMMY arguments in the
interface definition:
4.a. DUMMY arguments are not handed over to the called routine
4.b. The used DUMMY arguments are handled in the calling routine
as if they were handed over to the called routine and as if
the called routine has modified the content of the argument
(This may be restricted through the use of INTENT).
Rules for the body of the called routine:
4.c The DUMMY arguments are not accessible.
4.d The argument list consists only of the arguments that are
not
marked as DUMMY.
Especially, when the called routine is written in C, this
is
important.
Rules, if no interface definition is made
(i.e. old F77 style is used):
4.e DUMMY arguments are handed over.
This is necessary because the calling routine does know
anything about the clled interface.
Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
Nobelstr. 19, D-70550 Stuttgart, Germany . (Office: Allmandring 30)
More information about the mpiwg-fortran
mailing list