Topic: syntax problem in v8.20? in Support

KennyT

Posts: 320

Back to Top

27 Nov 2017 12:18 #20865

the following code segment crashes (floating stack overflow) at the RMSN= line:

  REAL, ALLOCATABLE 	:: RA(:), DIFF(:)
  
	ALLOCATE (RA(NE), DIFF(NE), stat=IST); IF( IST.GT.0 ) RETURN
	DO I = 1, NE
	 DIFF(I) =  SA(I+1) - SA(I)
	END DO
	WHERE (ABS(DIFF).GT.1.E15 ) DIFF  =  0.
	AV = SUM(DIFF)/NE
	RA(1:NE)	=  DIFF(1:NE)- AV
 	RMSN	=  SQRT(SUM((ra)*(ra) )/FLOAT(2*NE))

but this version doesn't crash:

 REAL, ALLOCATABLE 	:: RA(:), DIFF(:), r1(:)

  
	ALLOCATE (RA(NE), DIFF(NE), r1(ne), stat=IST); IF( IST.GT.0 ) RETURN
	DO I = 1, NE
	 DIFF(I) =  SA(I+1) - SA(I)
	END DO
	WHERE (ABS(DIFF).GT.1.E15 ) DIFF  =  0.
	AV = SUM(DIFF)/NE
	RA(1:NE)	=  DIFF(1:NE)- AV
  	r1(1:ne)  =  ra(1:ne)*ra(1:ne) 
	RMSN	=  SQRT(SUM(r1)/FLOAT(2*NE))

is the first version now wrong? - i think it was OK in v8.01...

K

JohnCampbell

Posts: 2526 Sydney

Back to Top

27 Nov 2017 12:46 (Edited: 27 Nov 2017 1:06) #20866

I think there is a problem with a possible temporary array for 'SUM((ra)*(ra) )', but there are a lot of questions about this.

Are you using /64 or not ? Did it work before ? Would 'SUM((ra)*(ra) )' require a temporary array, or should you accumulate the sum in a DO loop, without requiring a temporary array ?

I note that as of ver 8.20 : 64 bit FTN95 now stores large static arrays in the same way as automatic arrays. What would happen to a temporary array ?

Is SUM((ra)(ra) ) different from SUM(rara) ? Why not use the following. I would try to avoid a temporary array implied by (ra)*(ra)

   AV = SUM(DIFF)/NE 
   RMSN = 0
   DO I = 1, NE 
    RMSN = RMSN + (DIFF(I) - AV) ** 2
   END DO
   RMSN   =  SQRT( RMSN/FLOAT(2*NE) )

Also, why don't temporary arrays go on the heap ?

Why wouldn't it be better to eliminate all temporary arrays and have: sx = 0 sxx = 0 DO I = 1, NE x = SA(I+1) - SA(I) if ( abs(x) > 1.E15 ) cycle sx = sx + x sxx = sxx + xx END DO AV = sx/NE RMSN = SQRT ( ( sxx/NE - avav) / 2 )

KennyT

Posts: 320

Back to Top

27 Nov 2017 1:02 #20867

Thanks John,

no, this is 32-bit code

i think the original version was coded for speed at runtime. the routine gets called a lot!

pretty sure it used to work in v8.01.

K

JohnCampbell

Posts: 2526 Sydney

Back to Top

27 Nov 2017 1:12 #20869

The use of ' if ( abs(x) > 1.E15 ) cycle' may restrict optimisation, but avoiding all the temporary arrays would improve cache usage. In general, do you find the use of SUM ( array*array ) an efficient approach ? It would not be my choice, but worth considering. DOT_PRODUCT is an alternative, but is not typically optimised.

John

KennyT

Posts: 320

Back to Top

27 Nov 2017 2:24 #20870

Is SUM((ra)(ra) ) different from SUM(rara) ?

YES. it seems that removing the extra brackets around the RA arrays fixes it!

just have to go and look for other occasions where we've used that syntax elsewhere...

K

KennyT

Posts: 320

Back to Top

27 Nov 2017 4:09 #20872

Paul, if you need a test program...

!ftn95$free
        program test

	real, allocatable :: ra(:)

	allocate(ra(10))
	do i=1,10
 	  ra(I)=float(I)*1.75
	end do

	rmsn	=  sqrt(sum((ra)*(ra))/20.0)
	end

interestingly, this doesn't crash....

!ftn95$free
        program test

	real, allocatable :: ra(:)

	allocate(ra(10))
	do i=1,10
 	  ra(I)=float(I)*1.75
	end do

	rmsn	=  sqrt(sum((1.0*ra)*(1.0*ra))/20.0)
	end

K

JohnCampbell

Posts: 2526 Sydney

Back to Top

28 Nov 2017 8:16 (Edited: 28 Nov 2017 8:42) #20879

Ken,

Try this example on either 32-bit or 64-bit. (I have configured to use /fpp, see noteson64bitftn95.txt)

!ftn95$free 
         program test 

    real, allocatable :: ra(:)
    real    :: rmsn, rm2
    integer :: num, i ,k,kk
    integer*8 :: t1,t2,t3, t4

    do kk = 1,2  ! 2 pass for less unstable timer
    do k = 1,5
       num = 10**k
       call rdtsc (t1)
       allocate(ra(num)) 
       do i=1,num
          ra(I)=float(I)*1.75 
       end do 
       call rdtsc (t2)
   
!       rmsn   =  sqrt(sum((ra)*(ra))          /2./num)  ! fails
!       rmsn   =  sqrt(sum((1.0*ra)*(1.0*ra))  /2./num)  ! works
        rmsn   =  sqrt(sum(ra*ra)              /2./num)  ! works
       call rdtsc (t3)
        rm2    =  sqrt( dot_product ( ra, ra ) /2./num)  ! works
       call rdtsc (t4)

       write (*,*) 'num=',num, '  rmsn =',rmsn, '  expecting',rm2
       write (*,*) t2-t1, t3-t2, t4-t3
       deallocate (ra)
    end do
    end do
    end

    subroutine rdtsc (tick)
    integer*8 tick
!   real*10,   external :: cpu_clock@   32-bit
!   integer*8, external :: RDTSC_VAL@   64-bit

  CIF(_WIN64)
    tick = RDTSC_VAL@ ()
  CELSE
    tick = cpu_clock@ ()
  CENDIF

    end subroutine rdtsc

I'd expect that dot_product would be the best option, as no temporary arrays could be implied. /64 is much better. The use of the timer was a bit unstable for low num.

I also looked at expanding out sum ( (SA(I+1) - SA(I))^2 ) but did not find a better solution.

To remove the IF from calc lops, could you use:

 N = 1
 DO I = 1, NE 
   DIFF(N) = SA(I+1) - SA(I) 
   IF (ABS(DIFF(N)).LT.1.E15 ) N = N+1
 END DO
 N = N-1
 AV = SUM(DIFF(1:N))/N
 DIFF(1:N) = DIFF(1:N)-AV

KennyT

Posts: 320

Back to Top

28 Nov 2017 8:40 #20881

Thanks again John.

interestingly, I further modified the program to do a variable number of 'kk' loops and found that, for 5 loops, 'dot product' was slightly faster but for 100 loops, 'sum' was faster (which probably means they are actually identical for all practical purposes!)

K

JohnCampbell

Posts: 2526 Sydney

Back to Top

28 Nov 2017 8:45 #20882

Could that imply that for /32, a temporary array is not being generated ? I always try to remove possible stack overflow constructs. /64 is different at this time.

mecej4

Posts: 1912

Back to Top

28 Nov 2017 9:11 (Edited: 28 Nov 2017 10:32) #20883

Quoted from JohnCampbell I always try to remove possible stack overflow constructs.

Please note that the initial post of this thread stated that a floating point stack overflow occurred. That is, the generated code probably attempted an FLD x87 instruction at an instant when the registers ST0 to ST7 were already occupied. The compiler should have kept track of how many x87 registers were used up, and should have used the CPU stack space as scratchpad space if necessary to relieve the pressure on the FPU register stack.

I have seen other instances where 32-bit FTN95 had similar problems. Changing the CPU stack size (by using linker options, etc.) will have no effect on this problem, so it is important not to confuse FPU stack overflow, which applies only to the x87 segment of the processor, with the more common stack overflow in main memory.

Earlier threads relating to FPU stack overflow:

https://forums.silverfrost.com/Forum/Topic/694 https://forums.silverfrost.com/Forum/Topic/481 https://forums.silverfrost.com/Forum/Topic/492

A critical article by an expert (Kahan is one of the fathers of the X87):

http://cims.nyu.edu/~dbindel/class/cs279/stack87.pdf

An article that goes into details of the undesirable effects of x87 register spillage, The pitfalls of verifying floating-point computations by David Monniaux:

https://hal.archives-ouvertes.fr/hal-00128124v5/document

JohnCampbell

Posts: 2526 Sydney

Back to Top

28 Nov 2017 9:18 #20884

mecej4,

That is correct, which means we should highlight the post above which exhibits this problem, for Paul to address: !ftn95$free program test

   real, allocatable :: ra(:) 

   allocate(ra(10)) 
   do i=1,10 
      ra(I)=float(I)*1.75 
   end do 

   rmsn   =  sqrt(sum((ra)*(ra))/20.0) 
   end

PaulLaidler

Posts: 7976 Salford, UK

Back to Top

30 Nov 2017 6:46 #20900

Thanks. I have made a note of this.

PaulLaidler

Posts: 7976 Salford, UK

Back to Top

30 Nov 2017 8:27 #20901

This regression in v8.20 has now been fixed for the next release.