Now the physical memory on my PC is 16Gb and there is presumably no penalty in accessing the contents in any order.
There is now a penalty in transferring between memory and cache.
These cache transfers are managed in pages (64kbytes ?) so if you are addressing 1 byte, the whole page is transferred. If you step all over memory, there are a lot of 64k page transfers.
do I = 1,l
do j = 1,m
do k = 1,n
byte = g_array(k,j,i) ! this is good sequential memory access for Fortran
byte = b_array(I,j,k) ! this is skipping all over memory
end do
end do
end do
You should try testing this for an arrays of say 100mb and time the two different access sequences.
real4, allocatable :: b_array(:,:,:)
real4, allocatable :: g_array(:,:,:)
!
integer4 :: l = 15
integer4 :: m = 3000
integer4 :: n = 3000
integer4 :: i,j,k
real4 :: byte
real4 :: elapse_sec, t1
allocate ( g_array(n,m,l) )
t1 = elapse_sec ()
do I = 1,l
do j = 1,m
do k = 1,n
g_array(k,j,i) = i+j+k ! this is good sequential memory access for Fortran
end do
end do
end do
byte = 0
do I = 1,l
do j = 1,m
do k = 1,n
byte = byte + g_array(k,j,i) ! this is good sequential memory access for Fortran
end do
end do
end do
t1 = elapse_sec () - t1
write (*,*) t1, ' good_array', byte
deallocate ( g_array )
!
allocate ( b_array(l,m,n) )
t1 = elapse_sec ()
do I = 1,l
do j = 1,m
do k = 1,n
b_array(i,j,k) = i+j+k ! this is skipping all over memory
end do
end do
end do
byte = 0
do I = 1,l
do j = 1,m
do k = 1,n
byte = byte + b_array(i,j,k) ! this is skipping all over memory
end do
end do
end do
t1 = elapse_sec () - t1
write (*,*) t1, ' bad_array', byte
deallocate ( b_array )
!
end
real*4 function elapse_sec ()
integer*4 tick, rate
call system_clock ( tick, rate )
elapse_sec = real(tick) / real(rate)
end function elapse_sec