Topic: EOSHIFT is very slow? in Support

KennyT

Posts: 320

Back to Top

23 Jul 2012 2:35 #10500

While benchmarking some code (comparing some old F77 code with new improved F95 code) we noticed a severe speed impact in the new code. The below test program illustrates the issue:

!ftn95$free
PROGRAM NDLIS

  REAL, ALLOCATABLE	:: TR(:), XX(:)
  REAL*8  :: HIGH_RES_CLOCK@


!   Set up arrays
    write(*,*)  'Setting up arrays'
	ALLOCATE (TR(500), XX(500), stat=IST)
	DO I = 1, size(TR)
	 TR(I)	=  FLOAT(I)/100
	END DO

	 WRITE (*,*)
	 WRITE (*,*) 'EOSHIFT'
	 NE	=  size(TR)
	 NT1		=  4
!    Do array data shift using F95 intrinsic function	 
	 E1	=  HIGH_RES_CLOCK@ (.false.)
	 DO K = 1, 100000
	  XX	=  EOSHIFT(TR(1:NE), NT1)
	 END DO 
	 E2	=  HIGH_RES_CLOCK@ (.false.)
!    Do array data shift using 'F77-style' code	 
	 E3	=  HIGH_RES_CLOCK@ (.false.)
	 DO K = 1, 100000
	  DO I = 1, NE-NT1
	   XX(I)	=  TR(I+NT1)
	  END DO 
	 END DO 
	 E4	=  HIGH_RES_CLOCK@ (.false.)
 	 T1	=  E2-E1
 	 T2	=  E4-E3
 	 TT	=  (T1 + T2) / 100.
	 WRITE (*,*)  ' F95 ', NINT(T1/TT)	! typical 	90%  (2.13s)
	 WRITE (*,*)  ' F77 ', NINT(T2/TT)	!		10%  (0.25s)

END

So, it appears that EOSHIFT is 8-9x slower than the equivalent 'F77' looping version!!!

We're busy recoding our code the 'old' way, but thought you might be interested in examining what EOSHIFT is doing!

K

JohnCampbell

Posts: 2526 Sydney

Back to Top

24 Jul 2012 1:26 #10503

Kenny, It probably will not influence the result, but high_res_clock@ can sometimes produce problems. An alternative is to use QueryPerformanceCounter, which is fast and accurate as calibrated clock.

      SUBROUTINE ELAPSE_SECOND (ELAPSE)
!
!     Returns the total elapsed time in seconds
!     based on QueryPerformanceCounter
!     This is the fastest and most accurate timing routine
!
      real*8,   intent (out) :: elapse
!
      STDCALL   QUERYPERFORMANCECOUNTER 'QueryPerformanceCounter' (REF):LOGICAL*4
      STDCALL   QUERYPERFORMANCEFREQUENCY 'QueryPerformanceFrequency' (REF):LOGICAL*4
!
      real*8    :: freq  = 1
      logical*4 :: first = .true.
      integer*8 :: start = 0
      integer*8 :: num
      logical*4 :: ll
!      integer*4 :: lute
!
!   Calibrate this time using QueryPerformanceFrequency
      if (first) then
         num   = 0
         ll    = QueryPerformanceFrequency (num)
         freq  = 1.0d0 / dble (num)
         start = 0
         ll    = QueryPerformanceCounter (start)
         first = .false.
!         call get_echo_unit (lute)
!         WRITE (lute,*) 'Elapsed time counter :',num,' ticks per second'
      end if
!
      num    = 0
      ll     = QueryPerformanceCounter (num)
      elapse = dble (num-start) * freq
      return
      end

LitusSaxonicum

Posts: 2284 Yateley, Hants, UK

Back to Top

24 Jul 2012 7:27 #10507

Long experience tells me that compilers have what Les Hatton would have called 'Safe Subsets' (he retired at the same time as me, so his web presence may not be as good as previously if you google him). He meant a subset of code elements that the majority of users could be relied upon to use correctly, most of the time. Taking it further, one can see that a compiler must have a core of facilities that users of the safe subset test and test again. These are likely to work well, and relative to comparatively newer facilities, are likely to be more optimised. In FTN95, therefore, one ought to see the FTN77 core perform rather better (reliably? faster? bug-free?) than the Fortran 90 and 95 parts, particularly anything a bit esoteric. You will see fewer bug fixes to Fortran 77 (nearly none) than to the 90/95 facilities in the published bug fix list. Indeed, apart from fixes to things that Microsoft broke wth newer Windows and Visual Studio, and a few upgraded facilities, that's about all there is.

I'm not saying that this is how it should be, only how it most likely is ...

If EOSHIFT is slow and you are the first to discover it, then it may be because no-one used it in earnest, it was fast enough for them, they never compared it to anything else and so don't know that it is slow, or they know it is slow but prefer to use a later Fortran facility rather than coding it by hand in the old-fashioned way (or something else!)

If the slowness of EOSHIFT is a show-stopper for you, then you could always try dumping the assembly language code from your Fortran 77 style version, and then using CODE ... EDOC write a faster version using more modern cpu instructions, as was done in another thread (Fortran 2003/2008) with dot products. I know several posters who would be interested in the outcome - for names, consult the thread I mentioned.

As for me, I'm an old dog, and this is a new trick that I never knew existed, much less ever imagined that I needed! (On reflection I might, but not any time soon .... MRU file lists being a case in point).

Best of luck fixing it.

Eddie

KennyT

Posts: 320

Back to Top

27 Jul 2012 8:21 #10518

Les Hatton - a star! I remember a trial he did of the various seismic processing companies results. From memory, I think he defined exactly what parameters he wanted applied to a test dataset except one, which he left to the operators best judgement. When he compared the results he got back, the range of results was staggering - one set being 180degs out of phase to the others! I think he also wrote a witty article where he announced the best coloured pencils for doing seismic interpretation! :lol:

K

LitusSaxonicum

Posts: 2284 Yateley, Hants, UK

Back to Top

27 Jul 2012 4:10 #10522

Yes, I'm a big fan. But ALL of his articles are witty.

My safe subset of Fortran 9x is Fortran 77 - and then not all of it.

E