Hi,
Aleksandar Donev wrote:
> On Tuesday 01 April 2008 15:07, Hubert Ritzdorf wrote:
>
>
>> The run-time error occurred for the version which performed copy-in/copy
>> out. The usage of the compiler flag which avoids copy-in/copy out for that
>> source file worked.
>>
> Yes, but you did not answer my question: What would happen with that compiler
> and that flag when copy in/out were actually required at run-time (i.e., the
> actual was not contiguous)? What would you want/expect to happen?
>
If the application program would
(*) put in a subsection "array (i,:)" and
(*) sends/receives contiguous data and
(*) expects that the data is read from/written to this subsection
the application program would not work.
I hate destroying running application programs, but as I already mentioned
such an example would not work for isend/irecv. Therefore, it would be
probably better to have a well defined behaviour. The user has to
guarantee that the buffer is contiguous if the user expects continuous data.
If the user works with derived datatypes (non-contiguous data),
copy-in/copy-out would kill the application program also.
Thus, compile time checking is ok. Default run-time checking is not
ok since the application program might transfer non-contiguous data.
By the way. What does the 2003 compiler/runtime say if the application
program
(*) tries to perform an asynchronous read/write of an non-contiguous
array section
and
(*) pass this array-section to a subroutine which requires copy-in/copy
out
before the WAIT statement is executed ?
Switches the runtime system back to synchronous I/O ?
Or is this simply an user error ?
> The point here is that we are talking about standardizing things, and the
> standard needs to explain exactly what happens for *all* source codes, not
> just your particular one which "worked".
>
>
>> I am probably looking for an extension of the ASYNCHRONOUS (not VOLATILE)
>>
> Why not VOLATILE---is there a difference? The only difference, as far as I
> know, is that ASYNC is restricted to Fortran async I/O, and volatile for
> everyting "outside of the Fortran standard". Sounds to me like VOLATILE is
> exactly what you want. Unless you want somewhat different
> semantics/constraints (which is not presently the case---the only difference
> I know is that ASYNC attribute is implicitly given to variables involved in
> async I/O, while VOLATILE must be explicit).
>
VOLATILE is different. Volatile means that other processes/threads may
change the
data. The effect is, that the run-time system has to reload each memory
location
directly from memory if the application program accesses a variable.
This kills any type of optimization and significantly increases the
memory traffic.
MPI nonblocking communication is quite similar to asynchronous read/write.
Therefore, an extension of ASYNCHRONOUS attribute for MPI or other
communication libraries would be most appropriate and should cause minimal
conflicts with the actual Fortran standard. You can take most of the
description
of the ASYNCHRONOUS attribute and replace input/output by communication
(only the WAIT doesn't fit, since MPI_Wait is not know by the Fortran
standard).
For example:
The ASYNCHRONOUS_EXT is an extension of the ASYNCHRONOUS attribute.
NOTE 12.26:
The ASYNCHRONOUS_EXT attribute species the variables that might be
associated
with a pending sequence (the actual memory locations
on which (asynchronous, non-blocking) communication is being performed)
while the scoping unit is in execution. This information could be used
by the compiler to disable certain code motion optimizations.
Note 5.8:
The constraints on actual arguments that correspond to a dummy argument
with ASYNCHRONOUS_EXT attribute are designed to avoid forcing a processor
to use the so-called copy-in/copy-out argument passing mechanism.
Making a copy of actual arguments whose values are likely to change due
to a (non-blocking, asynchronous) communication operation completing or
in some unpredictable manner will cause those new values to be lost
when a called procedure returns and the copy-out overwrites the
actual argument or the application program aborts.
The ASYNCHRONOUS_EXT attribute is similar to the VOLATILE and ASYNCHRONOUS
attribute. It is intended to facilitate traditional code motion
optimizations in the presence
of (asynchronous, non-blocking) communication.
> The best thing would be for you (or whoever has specific ideas) to write a
> sample code that calls MPI_Isend, including the proposed interface for
> MPI_Isend, and explain what each of the proposed attributes/keywords means.
> Maybe that would bring some progress instead of going in circles. If it
> helps, assume that the buffer is a REAL array for now (i.e., separate the
> TKR-mismatch issue from the copy in/out issue).
>
I have attached an exemplary F90 test program which tries to show a simple
copy-in/copy-out problem which I have seen for an application program.
The application program crashed because the Fortran compiler created a
temporary array for the section recv_vector(ip+1:ip+len_sent(i)).
If recv_vector(ip+1) was passed, the application program worked fine.
The proposed interfaces for MPI_Irecv and MPI_Isend are at the head
of the Fortran 90 file.
Hubert
!
! Proposed MPI Interfaces;
!
! Subroutine MPI_Isend(buf, count, datatype, dest, tag, comm, &
! request, ierror)
! VOID :: buf
! ASYNCHRONOUS_EXT :: buf
! Integer, Intent (In) :: count, datatype, dest, tag, comm
! Integer, Intent (Out) :: request, ierror
! End Subroutine MPI_Isend
!
! Subroutine MPI_Irecv(buf, count, datatype, source, tag, &
! comm, request, ierror)
! VOID :: buf
! ASYNCHRONOUS_EXT :: buf
! Intent (InOut) :: buf
! Integer, Intent (In) :: count, datatype, source, tag, comm
! Integer, Intent (Out) :: request, ierror
! End Subroutine MPI_Irecv
!
! The ASYNCHRONOUS_EXT is an extension of the ASYNCHRONOUS attribute.
!
! The ASYNCHRONOUS_EXT attribute species the variables that might be associated
! with a pending sequence (the actual memory locations
! on which (asynchronous, non-blocking) communication is being performed)
! while the scoping unit is in execution. This information could be used
! by the compiler to disable certain code motion optimizations.
!
! The constraints on actual arguments that correspond to a dummy argument
! with ASYNCHRONOUS_EXT attribute are designed to avoid forcing a processor
! to use the so-called copy-in/copy-out argument passing mechanism.
! Making a copy of actual arguments whose values are likely to change due
! to a (non-blocking, asynchronous) communication operation completing or
! in some unpredictable manner will cause those new values to be lost
! when a called procedure returns and the copy-out overwrites the
! actual argument or the application program aborts.
!
! The ASYNCHRONOUS_EXT attribute is similar to the VOLATILE and ASYNCHRONOUS
! attribute. It is intended to facilitate traditional code motion
! optimizations in the presence of (asynchronous, non-blocking) communication.
!
program isend
use mpi
implicit none
! include 'mpif.h'
Integer :: ierror, len, nprocs, rank
Integer, Allocatable :: len_sent (:)
call MPI_Init(ierror)
call MPI_Comm_size (MPI_COMM_WORLD, nprocs, ierror)
call MPI_Comm_rank (MPI_COMM_WORLD, rank, ierror)
Allocate (len_sent(nprocs), stat = ierror)
if (ierror > 0) then
print *, 'Error: could not allocate vector len_sent'
call MPI_Abort (MPI_COMM_WORLD, 1, ierror)
endif
len = 10000
if (rank == 0) then
print *, 'Length per process:', len, ' reals'
#ifdef AVOID_COPY
print *, 'Copy-in/Copy-out avoided'
#endif
endif
len_sent (:) = len
call test_isend (len_sent, nprocs, rank)
call MPI_Finalize(ierror)
end
subroutine test_isend (len_sent, nprocs, rank)
use mpi
implicit none
! include 'mpif.h'
Integer, Intent (In) :: rank, nprocs
Integer, Intent (In) :: len_sent (nprocs)
Real, Pointer :: send_vector (:), recv_vector (:)
Integer :: i, ierror, ip, j, len_tot, n_errors
Integer :: recv_req (nprocs)
Integer :: send_req (nprocs)
len_tot = sum (len_sent)
! Allocate vectors
Allocate (send_vector(len_tot), stat = ierror)
if (ierror > 0) then
print *, 'Error: could not allocate send vector'
call MPI_Abort (MPI_COMM_WORLD, 1, ierror)
endif
Allocate (recv_vector(len_tot), stat = ierror)
if (ierror > 0) then
print *, 'Error: could not allocate recv vector'
call MPI_Abort (MPI_COMM_WORLD, 1, ierror)
endif
! Initialize vectors
send_vector (:) = rank+1
recv_vector (:) = -1
! Non-blocking receive
ip = 0
do i = 1, nprocs
#ifdef AVOID_COPY
! In this case, the program worked since
! copy-in/copy-out is not performed.
!
call MPI_Irecv (recv_vector(ip+1), len_sent(i), &
MPI_REAL, i-1, 1, MPI_COMM_WORLD, recv_req(i), &
ierror)
#else
! A Fortran 90 compiler performed copy/in copy out at
! this location and passed the temporary array to MPI_Irecv.
! This caused incorrect results or segmentation violations.
!
call MPI_Irecv (recv_vector(ip+1:ip+len_sent(i)), len_sent(i), &
MPI_REAL, i-1, 1, MPI_COMM_WORLD, recv_req(i), &
ierror)
#endif
ip = ip + len_sent (i)
end do
!
call MPI_Barrier (MPI_COMM_WORLD, ierror)
! Non-Blocking send
ip = 0
do i = 1, nprocs
#ifdef AVOID_COPY
! This avoids copy-in and incorrect results.
!
call MPI_Isend (send_vector(ip+1), len_sent(i), &
MPI_REAL, i-1, 1, MPI_COMM_WORLD, send_req(i), &
ierror)
#else
! A Fortran 90 compiler may perform copy-in at this location and
! may pass the temporary array to MPI_Isend.
! This may cause incorrect results.
!
call MPI_Isend (send_vector(ip+1:ip+len_sent(i)), len_sent(i), &
MPI_REAL, i-1, 1, MPI_COMM_WORLD, send_req(i), &
ierror)
#endif
ip = ip + len_sent (i)
end do
!
call MPI_Barrier (MPI_COMM_WORLD, ierror)
! Wait for completion of non-blocking requests
call MPI_Waitall (nprocs, send_req, MPI_STATUSES_IGNORE, ierror)
call MPI_Waitall (nprocs, recv_req, MPI_STATUSES_IGNORE, ierror)
! Control
n_errors = 0
ip = 0
do i = 1, nprocs
do j = 1, len_sent(i)
if (recv_vector (ip+j) /= i) then
print *, rank, 'error in element ', ip+j, ', expected:', i, &
' got', recv_vector (ip+j)
n_errors = n_errors + 1
endif
end do
ip = ip + len_sent (i)
end do
!
call MPI_Allreduce (MPI_IN_PLACE, n_errors, 1, MPI_INTEGER, MPI_SUM, &
MPI_COMM_WORLD, ierror)
if (n_errors == 0 .and. rank == 0) then
print *, "No errors detected"
endif
! Free memory
Deallocate (send_vector, recv_vector)
return
end
|