While benchmarking some code (comparing some old F77 code with new improved F95 code) we noticed a severe speed impact in the new code. The below test program illustrates the issue:
!ftn95$free
PROGRAM NDLIS
REAL, ALLOCATABLE :: TR(:), XX(:)
REAL*8 :: HIGH_RES_CLOCK@
! Set up arrays
write(*,*) 'Setting up arrays'
ALLOCATE (TR(500), XX(500), stat=IST)
DO I = 1, size(TR)
TR(I) = FLOAT(I)/100
END DO
WRITE (*,*)
WRITE (*,*) 'EOSHIFT'
NE = size(TR)
NT1 = 4
! Do array data shift using F95 intrinsic function
E1 = HIGH_RES_CLOCK@ (.false.)
DO K = 1, 100000
XX = EOSHIFT(TR(1:NE), NT1)
END DO
E2 = HIGH_RES_CLOCK@ (.false.)
! Do array data shift using 'F77-style' code
E3 = HIGH_RES_CLOCK@ (.false.)
DO K = 1, 100000
DO I = 1, NE-NT1
XX(I) = TR(I+NT1)
END DO
END DO
E4 = HIGH_RES_CLOCK@ (.false.)
T1 = E2-E1
T2 = E4-E3
TT = (T1 + T2) / 100.
WRITE (*,*) ' F95 ', NINT(T1/TT) ! typical 90% (2.13s)
WRITE (*,*) ' F77 ', NINT(T2/TT) ! 10% (0.25s)
END
So, it appears that EOSHIFT is 8-9x slower than the equivalent 'F77' looping version!!!
We're busy recoding our code the 'old' way, but thought you might be interested in examining what EOSHIFT is doing!
K