Topic: Optimizer bug in FTN95 7.10 in Support

mecej4

Posts: 1914

Back to Top

16 Apr 2015 3:47 (Edited: 3 Aug 2015 11:41) #16212

The following short program demonstrates an optimizer bug that was first encountered in a ~800 line program.

      program tmb11
      implicit none
      double precision A(2,5)
      integer :: j,n=5
      DOUBLE PRECISION RMAX,RSUM
      INTRINSIC ABS
      data A/1d0,2d0,2d0,4d0,3d0,6d0,1d0,0d0,0d0,2d0/

      RMAX = 0.0D0
      RSUM = 0.0D0
      do j = 1, n
         RSUM = RSUM + a(1,j)**2
         if (RMAX >= ABS(a(1,j))) cycle
         RMAX = ABS(a(1,j))
      end do
      write(*,'(A,F4.1)')'RSUM = ',RSUM
      stop
      end

Compiled with /opt, the program gives the incorrect output

RSUM =  5.0

instead of the correct output, which is

RSUM =  15.0

P.S.: Paul, I have further narrowed down the location of the bug by looking at the assembly listing (see my post of Fri Apr 17, 2015 6:14 am, below). The optimized code calculates SUM(a(1,1)**2) instead of SUM(a(1,j)**2).

[Added 3 August 2015]: The bug is still present in the 7.20 compiler release.

PaulLaidler

Posts: 7977 Salford, UK

Back to Top

16 Apr 2015 6:46 #16213

Thanks. I have logged this for investigation.

LitusSaxonicum

Posts: 2284 Yateley, Hants, UK

Back to Top

16 Apr 2015 9:22 #16215

Hi Mecej4,

If you write your code '77-style' does the optimizer work? I ask this, not to push a geriatric code agenda, but to pin down whether it is something introduced with new forms, or something that was always there.

I find FTN95 adequately fast for my purposes without /OPT, and I remember having problems with it some years ago, and so I stopped using the optimization. In truth it was when I was wrestling with a nonstandard dpi setting problem to do with toolbar icon spacing, and it may have had nothing to do with /OPT, but then one gets into habits that are difficult to change.

Eddie

mecej4

Posts: 1914

Back to Top

16 Apr 2015 10:25 #16216

Optimizer bugs are quite fragile and, for that reason, hard to pin down. Small changes to the code , including changes in form (as you asked) and changes in syntax with no change in semantics, can make optimizer bugs active or dormant.

In my example, replacing the two statements containing RMAX in the DO loop by

RMAX = MAX(RMAX,ABS(a(1,j)))

makes the bug go away.

Old style code is less likely to run into optimizer problems simply because the compiler, having been applied to such code over more years, has had such bugs fixed. You may remember that very early Fortran 90 compilers produced such slow and buggy code that many users continued writing F77 for a few more years.

As a user, I think that I can live with /opt not speeding up the execution much, but I would not accept a situation where any option that is provided with a compiler has the effect of making a correct program work incorrectly -- Primum non nocere!

JohnCampbell

Posts: 2526 Sydney

Back to Top

16 Apr 2015 11:51 #16217

Eddie,

Even the minor change to: if (RMAX < ABS(a(1,j)) ) RMAX = ABS(a(1,j)) removes the problem.

Also changing to F77 syntax does remove the problem as: do 10 j = 1, n RSUM = RSUM + a(1,j)**2 if (RMAX >= ABS(a(1,j))) goto 10 RMAX = ABS(a(1,j)) 10 continue

In this example, changing 'goto 10' to 'cycle' initiates the error. I think that a GOTO in a DO loop can restrict some optimising possibilities, even ones we don't want.

John

mecej4

Posts: 1914

Back to Top

17 Apr 2015 12:14 #16221

John's comments (Thu Apr 16, 2015 5:51) illustrate why cut down reproducers are necessary evils. With the trimmed down code, especially one with fewer than, say, twenty lines, we can conjure up many ways of changing the code and working around the optimizer bug, because we know precisely which lines are responsible. With real code wherein an optimizer bug is suspected, it can be quite difficult to (i) establish that there is an optimizer bug, and (ii) find the line(s) of code responsible. A symbolic debugger is not going to be of much help in doing this, because optimized code and a symbolic debugger do not get along well.

Note that, in this example, an insightful and capable optimizer could recognize that the variable RMAX and all lines of code containing RMAX could be optimized away, and the optimizer bug, which is triggered by those lines of removed code, would remain in hiding.

Here is an excerpt from the assembly listing that shows the bug. The only FMUL instruction in the entire listing has been moved ahead of the DO loop, which begins at offset 70H (disregard the annotations such as 'AT 54', which give wrong offsets when /opt has been used). Thus, the 'optimized' code computes the sum of a(1,1)**2, rather than the sum of a(1,j)**2.

      0000004d(24/5/154)         mov       Temp@3,=2
   0012            RSUM = RSUM + a(1,j)**2                                                       AT 54
      00000054(25/5/245)         mov       eax,Temp@3
      00000057(26/5/245)         dfld      A[eax*8-16]
      0000005e(27/5/245)         fmul      fr0,fr0
      00000060(28/4/247)         dfstp     Temp@6
      00000063(30/3/8)           align16   
      00000070(31/3/8)        Label     __N3
      00000070(32/6/136)         dfld      Temp@6
   0013            if (RMAX >= ABS(a(1,j))) cycle                                                AT 73
      00000073(36/7/56)          mov       ecx,Temp@3
      00000076(33/6/136)         dfadd     RSUM
      00000079(34/5/19)          dfstp     RSUM

LitusSaxonicum

Posts: 2284 Yateley, Hants, UK

Back to Top

17 Apr 2015 4:08 #16222

John answered it: the post-77 syntax fools the optimizer, which was probably implemented for FTN77 - and I guess that when Paul works his way round to it he'll find that knowledge useful.

Ha! Primum non nocere. I'll remember that one. Optimizers have been known for some time to occasionally create flawed executables. I used to test for such things by running my code through lots of compilers, so hence I'm wary about extensions, but Clearwin+ is so seductive I've abandoned that virtue ...

Eddie