Author Message
KennyT

Joined: 02 Aug 2005
Posts: 277 Posted: Mon Nov 27, 2017 1:18 pm    Post subject: syntax problem in v8.20? the following code segment crashes (floating stack overflow) at the RMSN= line:

 Code: REAL, ALLOCATABLE    :: RA(:), DIFF(:)      ALLOCATE (RA(NE), DIFF(NE), stat=IST); IF( IST.GT.0 ) RETURN    DO I = 1, NE     DIFF(I) =  SA(I+1) - SA(I)    END DO    WHERE (ABS(DIFF).GT.1.E15 ) DIFF  =  0.    AV = SUM(DIFF)/NE    RA(1:NE)   =  DIFF(1:NE)- AV     RMSN   =  SQRT(SUM((ra)*(ra) )/FLOAT(2*NE))

but this version doesn't crash:
 Code: REAL, ALLOCATABLE    :: RA(:), DIFF(:), r1(:)      ALLOCATE (RA(NE), DIFF(NE), r1(ne), stat=IST); IF( IST.GT.0 ) RETURN    DO I = 1, NE     DIFF(I) =  SA(I+1) - SA(I)    END DO    WHERE (ABS(DIFF).GT.1.E15 ) DIFF  =  0.    AV = SUM(DIFF)/NE    RA(1:NE)   =  DIFF(1:NE)- AV      r1(1:ne)  =  ra(1:ne)*ra(1:ne)    RMSN   =  SQRT(SUM(r1)/FLOAT(2*NE))

is the first version now wrong? - i think it was OK in v8.01...

K    JohnCampbell

Joined: 16 Feb 2006
Posts: 2033
Location: Sydney Posted: Mon Nov 27, 2017 1:46 pm    Post subject: I think there is a problem with a possible temporary array for "SUM((ra)*(ra) )", but there are a lot of questions about this.

Are you using /64 or not ?
Did it work before ?
Would "SUM((ra)*(ra) )" require a temporary array, or should you accumulate the sum in a DO loop, without requiring a temporary array ?

I note that as of ver 8.20 : 64 bit FTN95 now stores large static arrays in the same way as automatic arrays. What would happen to a temporary array ?

Is SUM((ra)*(ra) ) different from SUM(ra*ra) ? Why not use the following. I would try to avoid a temporary array implied by (ra)*(ra)
 Code: AV = SUM(DIFF)/NE    RMSN = 0    DO I = 1, NE     RMSN = RMSN + (DIFF(I) - AV) ** 2    END DO    RMSN   =  SQRT( RMSN/FLOAT(2*NE) )

Also, why don't temporary arrays go on the heap ?

Why wouldn't it be better to eliminate all temporary arrays and have:
 Code: sx = 0    sxx = 0    DO I = 1, NE     x =  SA(I+1) - SA(I)     if ( abs(x) > 1.E15 ) cycle     sx  = sx + x     sxx = sxx + x*x    END DO    AV = sx/NE    RMSN = SQRT ( ( sxx/NE - av*av) / 2 )

Last edited by JohnCampbell on Mon Nov 27, 2017 2:06 pm; edited 1 time in total   KennyT

Joined: 02 Aug 2005
Posts: 277 Posted: Mon Nov 27, 2017 2:02 pm    Post subject: Thanks John, no, this is 32-bit code i think the original version was coded for speed at runtime. the routine gets called a lot! pretty sure it used to work in v8.01. K    JohnCampbell

Joined: 16 Feb 2006
Posts: 2033
Location: Sydney Posted: Mon Nov 27, 2017 2:12 pm    Post subject: The use of " if ( abs(x) > 1.E15 ) cycle" may restrict optimisation, but avoiding all the temporary arrays would improve cache usage. In general, do you find the use of SUM ( array*array ) an efficient approach ? It would not be my choice, but worth considering. DOT_PRODUCT is an alternative, but is not typically optimised. John   KennyT

Joined: 02 Aug 2005
Posts: 277 Posted: Mon Nov 27, 2017 3:24 pm    Post subject: Is SUM((ra)*(ra) ) different from SUM(ra*ra) ? YES. it seems that removing the extra brackets around the RA arrays fixes it! just have to go and look for other occasions where we've used that syntax elsewhere... K    KennyT

Joined: 02 Aug 2005
Posts: 277 Posted: Mon Nov 27, 2017 5:09 pm    Post subject: Paul, if you need a test program...
 Code: !ftn95\$free         program test    real, allocatable :: ra(:)    allocate(ra(10))    do i=1,10       ra(I)=float(I)*1.75    end do    rmsn   =  sqrt(sum((ra)*(ra))/20.0)    end

interestingly, this doesn't crash....

 Code: !ftn95\$free         program test    real, allocatable :: ra(:)    allocate(ra(10))    do i=1,10       ra(I)=float(I)*1.75    end do    rmsn   =  sqrt(sum((1.0*ra)*(1.0*ra))/20.0)    end

K    JohnCampbell

Joined: 16 Feb 2006
Posts: 2033
Location: Sydney Posted: Tue Nov 28, 2017 9:16 am    Post subject: Ken,

Try this example on either 32-bit or 64-bit.
(I have configured to use /fpp, see noteson64bitftn95.txt)
 Code: !ftn95\$free          program test     real, allocatable :: ra(:)     real    :: rmsn, rm2     integer :: num, i ,k,kk     integer*8 :: t1,t2,t3, t4     do kk = 1,2  ! 2 pass for less unstable timer     do k = 1,5        num = 10**k        call rdtsc (t1)        allocate(ra(num))        do i=1,num           ra(I)=float(I)*1.75        end do        call rdtsc (t2)     !       rmsn   =  sqrt(sum((ra)*(ra))          /2./num)  ! fails !       rmsn   =  sqrt(sum((1.0*ra)*(1.0*ra))  /2./num)  ! works         rmsn   =  sqrt(sum(ra*ra)              /2./num)  ! works        call rdtsc (t3)         rm2    =  sqrt( dot_product ( ra, ra ) /2./num)  ! works        call rdtsc (t4)        write (*,*) 'num=',num, '  rmsn =',rmsn, '  expecting',rm2        write (*,*) t2-t1, t3-t2, t4-t3        deallocate (ra)     end do     end do     end     subroutine rdtsc (tick)     integer*8 tick !   real*10,   external :: cpu_clock@   32-bit !   integer*8, external :: RDTSC_VAL@   64-bit   CIF(_WIN64)     tick = RDTSC_VAL@ ()   CELSE     tick = cpu_clock@ ()   CENDIF     end subroutine rdtsc

I'd expect that dot_product would be the best option, as no temporary arrays could be implied. /64 is much better.
The use of the timer was a bit unstable for low num.

I also looked at expanding out sum ( (SA(I+1) - SA(I))^2 ) but did not find a better solution.

To remove the IF from calc lops, could you use:
 Code: N = 1  DO I = 1, NE    DIFF(N) = SA(I+1) - SA(I)    IF (ABS(DIFF(N)).LT.1.E15 ) N = N+1  END DO  N = N-1  AV = SUM(DIFF(1:N))/N  DIFF(1:N) = DIFF(1:N)-AV

Last edited by JohnCampbell on Tue Nov 28, 2017 9:42 am; edited 1 time in total   KennyT

Joined: 02 Aug 2005
Posts: 277 Posted: Tue Nov 28, 2017 9:40 am    Post subject: Thanks again John. interestingly, I further modified the program to do a variable number of "kk" loops and found that, for 5 loops, "dot product" was slightly faster but for 100 loops, "sum" was faster (which probably means they are actually identical for all practical purposes!) K    JohnCampbell

Joined: 16 Feb 2006
Posts: 2033
Location: Sydney Posted: Tue Nov 28, 2017 9:45 am    Post subject: Could that imply that for /32, a temporary array is not being generated ? I always try to remove possible stack overflow constructs. /64 is different at this time.   mecej4

Joined: 31 Oct 2006
Posts: 1064 Posted: Tue Nov 28, 2017 10:11 am    Post subject: Re: JohnCampbell wrote: I always try to remove possible stack overflow constructs.

Please note that the initial post of this thread stated that a floating point stack overflow occurred. That is, the generated code probably attempted an FLD x87 instruction at an instant when the registers ST0 to ST7 were already occupied. The compiler should have kept track of how many x87 registers were used up, and should have used the CPU stack space as scratchpad space if necessary to relieve the pressure on the FPU register stack.

I have seen other instances where 32-bit FTN95 had similar problems. Changing the CPU stack size (by using linker options, etc.) will have no effect on this problem, so it is important not to confuse FPU stack overflow, which applies only to the x87 segment of the processor, with the more common stack overflow in main memory.

Earlier threads relating to FPU stack overflow:

http://forums.silverfrost.com/viewtopic.php?t=940
http://forums.silverfrost.com/viewtopic.php?t=718
http://forums.silverfrost.com/viewtopic.php?t=730

A critical article by an expert (Kahan is one of the fathers of the X87):

http://cims.nyu.edu/~dbindel/class/cs279/stack87.pdf

An article that goes into details of the undesirable effects of x87 register spillage, The pitfalls of verifying floating-point computations by David Monniaux:

https://hal.archives-ouvertes.fr/hal-00128124v5/document

Last edited by mecej4 on Tue Nov 28, 2017 11:32 am; edited 5 times in total   JohnCampbell

Joined: 16 Feb 2006
Posts: 2033
Location: Sydney Posted: Tue Nov 28, 2017 10:18 am    Post subject: mecej4,

That is correct, which means we should highlight the post above which exhibits this problem, for Paul to address:
 Code: !ftn95\$free         program test    real, allocatable :: ra(:)    allocate(ra(10))    do i=1,10       ra(I)=float(I)*1.75    end do    rmsn   =  sqrt(sum((ra)*(ra))/20.0)    end   PaulLaidler Joined: 21 Feb 2005
Posts: 5716
Location: Salford, UK Posted: Thu Nov 30, 2017 7:46 am    Post subject: Thanks. I have made a note of this.   PaulLaidler Joined: 21 Feb 2005
Posts: 5716
Location: Salford, UK Posted: Thu Nov 30, 2017 9:27 am    Post subject: This regression in v8.20 has now been fixed for the next release.   Display posts from previous: All Posts1 Day7 Days2 Weeks1 Month3 Months6 Months1 Year Oldest FirstNewest First
 All times are GMT + 1 Hour Page 1 of 1