Silverfrost Forums

Welcome to our forums

Code Generation Bug

21 Sep 2018 8:52 (Edited: 30 Sep 2018 10:55) #22554

There is an elusive code generation bug in FTN95 that occurs when one compiles a program in which DO loops are nested 3-deep or more. I have caught the bug only in 32-bit compilations. The bug surfaces more often with /opt, but I have also seen it occur with certain source codes when I compiled with /check and then not occur with /opt. The bug is data dependent, and that is another reason why it is so elusive. I have seen the bug with FTN95 7.2, 8.3 and 8.30 beta 279.

The shortest code that I have to exhibit the bug is about 325 lines, and is available at https://www.dropbox.com/s/1yuzdtla5bl4a5b/cdfbug.7z?dl=0 . Unzip the source, data and batch files. Set the environment variable OPT=/p6 /opt, and use the batch file bld.bat. Run the resulting EXE.

I have tested the test code using Gfortran, Intel and NAG compilers. They gave identical results without any crashes.

I have done extensive tracking at the assembly level, and here is what I see happening. Register EBX contains the address of the dummy argument V of subroutine SLNPRO, whose base address is copied from [EBP+8] for quick access throughout the subroutine. EBX is also used to hold the DO index JJ of the inner loop that starts at Line 123 of SLNPRO.F90. During the next iteration of the outer loop LL1, this value of EBX is used as the address of V, the first dummy argument of Subroutine SLNPRO. The error can go unnoticed unless the value of EBX is such as to cause a memory access violation, floating point error, etc. Note the value of EBX after the crash. I see 0000000D, which certainly cannot be the base address of V. There are several places where EBX is refreshed by reading [EBP+8], and places where EBX is re-saved to [EBP+8] (which I think is unnecessary). In the instance that I just described, this save/restore operation was not present. In effect, the last used value of JJ is used as the base address of V and this can cause havoc.

This is a complicated bug, and investigating it is made difficult by the absence of facilities in SDBG to do low level debugging. I have found it possible to see how the bug occurs by looking for a long time at the /exp listing of SLNPRO after I compiled with /p6 /opt. If you wish to read those arguments, I shall be happy to provide them.

Thanks.

[P.S., added 9/29/2018: A short reproducer posted on Sat Sep 29, 2018 (see later in this thread) displays the register usage bug with many versions of FTN95, 32 as well as 64-bit, along with a number of different options.]

22 Sep 2018 5:48 #22555

I will make a note that this needs investigating.

At one time you could use F11 in SDBG to get the assembly level code but I am not sure that this still works.

22 Sep 2018 8:22 #22557

Thanks for having this bug investigated.

I do use F11 as far as it can take me, but assembly level capabilities are very restricted. Breakpoints cannot be set by address, step-in and step-over do not work. For example, with the cursor on a CALL instruction to a library routine, one has to press the Step key once for each instruction in the library routine -- whose count may be unknown.

By the way, FTN95 /exp produces 32-bit listings with the assembly and source instructions interspersed, where as 64-bit listings are less useful with the source segregated from the assembly listing. Even in 32-bit listings I found that the frequent use of pseudo-variables such as 'Address of QROW' for, say, [EBP+12] makes low level work more difficult.

I did use the Visual Studio debugger as well as EDB (originally from Edinburgh Portable Compiler) in this project.

22 Sep 2018 12:41 #22558

mecej4,

I was able to reproduce your error using V8.20.0

I also did some minor changes to see where the program was going, eg ssetv.f90, and the bug disappeared. The problem looks to be elusive, as you describe.

subroutine ssetv(nump, numq, numgr, ftbwt, pwr, iopt, epsq, indlw, altbl, &
                 indf, wtble, mode, v, m)
   implicit none
   integer, parameter :: double = kind(0d0)
   integer ,      intent(in)    :: nump, numq, numgr, iopt, mode
   integer ,      intent(out)   :: m
   real(double) , intent(in)    :: epsq
   integer ,      intent(in)    :: indf(101), indlw(101)
   real(double) , intent(in)    :: ftbwt(101), altbl(101), wtble(101), pwr(101,15)
   real(double) , intent(inout) :: v(635,17)
!
   integer      :: ione, nmpp1, nmppq, n, np1, i, mm1, nnn
   real(double) :: eps
!
   write (*,*) 'entering ssetv'
   ione = mod(iopt,10)
...
22 Sep 2018 1:05 #22559

John,

Thanks for putting in the effort to test for the presence of the bug in V8.2 and reporting your findings.

Bugs of this type are quite unpleasant and harmful, since a small change, such as removing a diagnostic WRITE statement from working code, may make the bug surface. Even then, if the user is not suspicious and on the lookout, wrong results may be taken as correct, so we should be happy when the program crashes.

Who knows, if FTN95-64 uses the same register allocation algorithm as FTN95-32, the same problem could occur -- now, with 16 instead of 8 registers, only with huge programs such as CFD and FEA programs.

23 Sep 2018 2:39 #22563

Paul, I managed to put together a short reproducer that contains the essentials of the code generation bug. The code is legal Fortran, and is error free. The bug may be seen by compiling and running with /opt /p6. The program will abort with the message 'The instruction at address xxxxxxxx attempted to read from location 00000029'. At this point, EBX = 1, ESI = 6, and the instruction is DFLD [ebx - 0x8 + esi*8].

program slnpro
   implicit none
   integer, parameter :: double = kind(0.0d0)
   integer :: m=4, n=1
   real(double)  :: v(4,2)
   data v/ 6.7307d-1, 1.4748d-1, 3.4416d-1, 7.3645d-1, &
      6.1452d-1, 5.5309d-3, 5.4214d-1, 4.3507d-1/
!
   call sub(v,m,n)
end program slnpro

subroutine sub(v,m,n)
   implicit none
   integer, parameter :: double = kind(0.0d0)
   integer, intent(in) :: m,n
   real(double) :: v(m,n+1)
   integer :: mxrkn=2,jj,lrknt,j=2,id=3,np1,ii
   real(double) :: rea=1d2, rowq, amin, rea2=-1d-4,rtcol,y=-2d0

   np1=n+1
   amin=0d0

Outer: do ii = n, m
   if (ii == id) cycle
   rowq = v(ii,j)/v(id,j)            ! crash here during second iteration
                                     ! EBX should contain the base address of V,
                                     ! but actually contains the last value of JJ
                                     ! from the previous execution of the inner loop
   rtcol = v(ii,np1) - v(id,np1)*rowq
   if (rtcol + rea2 >= 0.D0) cycle
   lrknt = 3
   do jj = 1, n
      if (jj == j) cycle
      if (y+v(ii,jj) > rea) cycle
      lrknt = lrknt + 1
      if (lrknt > mxrkn) cycle  Outer ! this jump is taken, with EBX containing JJ
   end do
   ! when the DO JJ loop terminates normally, EBX is restored to base address of V
   if (lrknt >= mxrkn) then
      if (lrknt > mxrkn) cycle
      if (v(id,j) >= amin) cycle
   endif
   mxrkn = lrknt
   amin = v(id,j)
end do Outer
   return
end subroutine sub
23 Sep 2018 3:08 #22564

Thanks.

23 Sep 2018 5:19 #22565

I abandoned /opt many years ago, thinking that I wasn't getting the right answer - in a case when I knew what the answer was, and got it without /opt - believing it was the result of code re-arrangement. Perhaps fortunately, all my stuff executes very quickly without on modern computers, and also, the'need for speed' is not essential with a Windows program in which the pace is governed by human reactions and thought processes.However, it wasn't crashing.

Is it the /P6, the /opt, or the combination that is at fault?

Eddie

23 Sep 2018 9:51 #22566

Quoted from LitusSaxonicum Is it the /P6, the /opt, or the combination that is at fault?

In this instance, it is the presence of /opt that leads to bad code. I specified the /P6 to have the compiler generate the x87 compare-and-set-flags instructions.

If you still have some old codes that gave correct results without /opt and bad results with /opt, I should appreciate your making them available.

27 Sep 2018 11:47 #22574

I have tested both the original code and the cut down version a number of times but unfortunately I don't get the crash.

I am guessing that a fix would involve switching off Fortran loop optimisation upon encountering a use of CYCLE. This would be a non-trivial task and I am not sure that it is worth putting on the list of things to do given the general move to 64 bit compilation.

If 32 bit optimisation makes a significant difference to the run time for this code, I can only suggest (at least for now) that you try switching off Fortran loop optimisation which I think would involve using '/inhibit_opt 43'.

27 Sep 2018 11:20 #22582

That is strange, Paul. With the short code above, I get a crash with FTN95 versions 6.35, 7.1, 7.2, 8.1, 8.2, 8.3 and 8.3.279.

s:\lang\ftn95>ftn95 /p6 /opt slnpro.f90 /link [FTN95/Win32 Ver. 8.30.279 Copyright (c) Silverfrost Ltd 1993-2018] NO ERRORS [<SLNPRO> FTN95 v8.30.279] NO ERRORS [<SUB> FTN95 v8.30.279] Creating executable: slnpro.EXE

s:\lang\ftn95>slnpro

The crash report:

NO ERRORS  [<SLNPRO> FTN95 v8.30.279]
NO ERRORS  [<SUB> FTN95 v8.30.279]

Creating executable: slnpro.EXE

s:\lang\ftn95>slnpro

The crash report: [quote:d6105e5ecd]Runtime error from program:s:\lang\ftn95\slnpro.exe Access Violation The instruction at address 0040110f attempted to read from location 00000029

00401050 SUB [+00bf] 00401000 main [+003f]

eax=00000002 ebx=00000001 ecx=00000003 edx=00000004 esi=00000006 edi=00404000 ebp=0360fc98 esp=0360fc00 IOPL=2 ds=002b es=002b fs=0053 gs=002b cs=0023 ss=002b flgs=00010297 [CA EP NZ NS DN NV]

0040110f dfld [ebx-0x8+esi*8] 00401113 dfdiv [ebp-0x68] 00401116 dfstp [ebp-0x28]

The offsets in the traceback, and the contents of EBP and ESP (which are related to those offsets) vary with the compiler version, but the values of the other registers as well as the reported instructions are identical.

John Campbell, if you read this, please try the short example code with any version of FTN95.

28 Sep 2018 1:15 #22586

Using FTN95/Win32 Ver. 8.20.0 I tried the short code example posted on Sunday night and: it crashed with set opt=/p6 /opt but worked with set opt=/p6 /opt /inhibit_opt 43

my bat file is

set opt=/p6 /opt /inhibit_opt 43

del %1.obj
del %1.exe

ftn95 %1.f90 %opt% 
slink %1.obj /out:%1.exe

%1

Paul, given how difficult it is to identify errors associated with these register mis-settings, it would be good to have a better understanding of what is causing this and how extensive it may be for using /opt.

John

28 Sep 2018 1:40 #22588

Thanks for the quick response, John.

My expectations for optimisation are a bit nuanced. If the source code is correct and standard-conforming, choosing /opt should be a trade-off of compilation speed for faster execution. Integer and character results should be identical whether or not /opt was used. Floating point calculations can yield slightly different results.

If the source code is not quite standard-conforming, /opt becomes an adventurous option; code that works fine without /opt may fail now and then, but should work most of the time. Once in a while, we may even find that using /opt can cause slower runs.

28 Sep 2018 2:38 #22589

I have learnt to be very selective where I use /opt, typically with code that is only a few lines long. Using /opt with large code (files) can produce unexpected results, that become too difficult to debug. It would be a good outcome if this thread identified a problem that could eliminate a bug that occurs more generally.

28 Sep 2018 6:25 #22592

In general, with the progress towards 64 bit applications, it is not prudent for us to spend much time on 32 bit optimiser bugs. If they are easy to fix then all well and good. Otherwise we need to devote our resources to matters of greater impact and demand.

FTN95 is well known for its rapid compilation and good error reporting etc. When it comes to fast run time code it may be that other compilers can sometimes (or maybe often) do better. Going forward, this may not be the case for 64 bit applications where we have the potential to develop a fast LAPACK type library.

So for 32 bit applications the general rule is, (a) develop your application using CHECKMATE (b) test and get verifiable results in release mode (without optimisation) (c) if run time speed is important, try /opt but make sure you get the same test results as without it. Only use /opt after testing and when there is a clear improvement in run time speeds.

Our aim is to make 64 bit optimisation more robust than proved to be the case with 32 bit optimisation. But in the end, for faster number crunching you may need to invest in a more expensive compiler and only use FTN95 during development. Alternatively you could invest in a faster processor, or more RAM etc.

29 Sep 2018 5:39 (Edited: 30 Sep 2018 12:02) #22607

Paul's comments are quite sensible, and until now I had not succeeded in making the register usage bug to come alive in any 64-bit code. More poking around shows that all versions of FTN95 (I tried 6.35, 7.2, which are 32-bit only; 8.1, 8.2 and 8.3, 32-bit and 64-bit) are probably afflicted by this bug. Here is a reliable and surprisingly short reproducer.

program salREGbug
implicit none
integer, parameter :: double = kind(0.d0)
integer :: l = 2, jbl(2) = (/ 1, 2 /)
real(double) :: alf    = -3.86D-01
real(double) :: x(2)   = (/ 1.61D-2, -2.69D0 /)
real(double) :: abl(3) = (/ 7.75D2, 3.93D2, 2.01D2 /)
!
call sub(l, abl, jbl, alf, x)
write(*,'(A,2x,2ES12.4)')'X = ',x
stop
end program

subroutine sub(l, abl, jbl, alf, x)
   implicit none
   integer, parameter           :: double = kind(0.d0)
   integer , intent(in)         :: l, jbl(l)
   real(double) , intent(in)    :: alf, x(*)
   real(double) , intent(in out) :: abl(l*(l+1)/2)
   integer                      :: i, k
!
! FTN95 8.3 (32 and 64 bit) have errors in tracking usage of registers
!       in the code generated from the source lines below. 
!
!       One or more registers may be overwritten with the values of 
!       more than one variable and the register contents are
!       accessed later. Variable values, including the patterns used to
!       initialise variables when /undef is used, may end up being used
!       as base addresses of arrays.
!
   k = 0
   do i = 1, l
      if (jbl(i) <= 0)cycle
      where ( jbl(:i) > 0 ) &
         abl(k+1:i+k) = abl(k+1:i+k) + alf*x(jbl(i))*x(jbl(:i))
      k = i + k
   end do
end subroutine

I ran the versions of FTN95 that I listed using a number of options.

 /checkmate
 /check
 -no option-
 /opt
 /opt /p6
 /opt /64
 /64
 /64 /check
 /64 /checkmate

The code aborted after an access violation in every one of the 37 runs. Note the improbable memory addresses from which a read is attempted at the point of the crash: with FTN95-8.30, the 32-bit program attempts to read from 0x20202020, and the 64-bit program attempts to read from 0x00000000.

The program has no errors; when run with, say, Gfortran, it gives:

X =     1.6100E-02 -2.6900E+00
29 Sep 2018 7:37 #22608

Many thanks for your effort and patience. We will investigate the issue.

2 Apr 2019 8:20 #23416

The fault demonstrated in program salREGbug has now been fixed for the next release of FTN95.

For anyone who is interested, it turned out to be a conflict between the use of an array in a WHERE mask and its use in the body of the WHERE statement as the index of another array. In other contexts that usage would be interpreted as a vector subscript which in turn means that the compiler must plant code to create a copy of the primary array using the vector subscript.

In the present case an inappropriate vector subscript copy was being created which might have been OK had it not been created after it had been used.

The following code gives the same result and avoids all 'snapshots' required when implementing WHERE in a serial processing compiler.

   k = 0 
   do i = 1, l 
      do m = 1,i
        if(jbl(m) > 0) abl(k+m) = abl(k+m) + alf*x(jbl(i))*x(jbl(m))
      end do  
      k = i + k 
   end do 
3 Apr 2019 6:54 #23422

I am a bit confused by the WHERE construct, so I am a bit confused by Paul's comment 'The following code gives the same result'. Is that the case ? Going from: k = 0 do i = 1, l if (jbl(i) ⇐ 0)cycle where ( jbl(:i) > 0 ) & abl(k+1:i+k) = abl(k+1:i+k) + alf*x(jbl(i))*x(jbl(:i)) k = i + k end do

To: k = 0 do i = 1, l do m = 1,i if(jbl(m) > 0) abl(k+m) = abl(k+m) + alf*x(jbl(i))*x(jbl(m)) end do
k = i + k end do

My interpretation would be: k = 0 do i = 1, l if (jbl(i) ⇐ 0) cycle do m = 1,i if (jbl(m) > 0) abl(k+m) = abl(k+m) + (alf*x(jbl(i))) * x(jbl(m)) end do k = k + i end do

I suspect in hindsight, my quibble is not related to the code generation bug, but I do find I much prefer an explicit DO loop to a WHERE statement. In almost all cases where I adopt code including WHERE or conditional FORALL statements, I will replace the code with a DO loop, because the logic flow is clearer to me. Possibly the reason that this bug has not been found for so long is many users of FTN95 have a similar view to me and rarely use WHERE. I have not observed any performance advantage for adopting WHERE or FORALL in any F90+ compiler I have used.

Perhaps I should stop learning new Fortran ?

John

3 Apr 2019 7:32 #23424

John

Do you need an ENDDO in your snippet?

A WHERE statement/construct is already very tricky. I don't see how its interpretation could depend on its context (within an outer DO loop or not). But I might be wrong.

A WHERE can be expanded to an IF within a DO with the added complication that it must simulate a parallel process. So usually the compiler will also arrange to copy the arrays just in case there are side effects.

So on a serial processing machine a WHERE can be less efficient as well as being less readable. It is quite possible that the Fortran standards committee only intended WHERE to be used on parallel processing machines.

Please login to reply.