Topic: Optimisation bug in 64-bit

mecej4

Posts: 1911

Back to Top

7 Sep 2019 2:18 (Edited: 7 Sep 2019 2:42) #24319

With the current 8.51 compiler, the following test program works correctly when it is compiled with any combination of options except /64 /opt. With that particular combination, the compiler pulls the value of a temporary expression from an incorrect register; that value is then used as an array index and, as one may expect, causes an access violation.

module globvars
   implicit none
   save
   integer :: nx,nxy,nz,nkua,ipe1,ipe2,kb,kt,ikk,jkk,kkk,ikua,cnt
   integer, allocatable :: mlog(:)
   real, allocatable :: sup(:), qfflu(:), qwflu(:), mobw(:)
end module

subroutine kuamod(uqwm)
   use globvars
   
   implicit none
   integer :: m, mm, k
   real    :: uqwm

   uqwm = 0.
   do k = kb, kt
      if (ipe2==0) then
         m = (k-1)*nxy   + (jkk-1)*nx + ikk  ! m1
      else
         m = (kkk-1)*nxy + (jkk-1)*nx + k    ! m2
      end if
      mm = (ikua-1)*nz + k
      qfflu(mm) = sup(m)*qwflu(mm)    ! Error: m = m1 regardless of value of ipe2
      uqwm = uqwm + qfflu(mm)
      cnt = cnt+1
      mlog(cnt) = m
   end do
   return
end subroutine

program wbug
   use globvars
   implicit none
   integer i, msu, mq
   real uqwm
   
   kb   = 2;  kt = 9;  nx = 5;  nxy = 9; nz = 7
   ikk  = 2; jkk = 3; kkk = 4; ikua = 1
   ipe1 = 1
   msu  = (kt-1)*nxy + (jkk-1)*nx + ikk
   mq   = (ikua-1)*nz + kt
   
   allocate(sup(msu), qwflu(mq), qfflu(mq), mlog(kt-kb+1))
   sup =   (/ ((0.025*i - 0.004)*i + 0.33, i=1,msu) /)
   qwflu = (/ (i*0.125, i=1,mq) /)
   
   cnt = 0; ipe2 = 0; call kuamod(uqwm)
   print 10,ipe2,uqwm
   print 20,(i,mlog(i),i=1,cnt)

   cnt = 0; ipe2 = 1; call kuamod(uqwm)
   print 10,ipe2,uqwm
   print 20,(i,mlog(i),i=1,cnt)
   
   10 format(' After call with ipe2 = ',i1,', uqwm = ',F5.1)
   20 format(1x,i3,i10)
   end program

With /opt /64, the generated code pre-calculates the two alternative expressions for the index variable m in the subroutine, and puts the results into temporary variables. Then, it tests ipe2 == 0 and sets m = the correct expression. In each case, the value is also placed in a register, and that register is to be used as the source of the index m in multiplying an XMM register by sup(m). As the following segment of the /exp listing shows, when IPE2 = 0, the value of m will be in RDI and, when IPE2 /= 0, it will be in RCX. Unfortunately, only the latter (RCX) is used as the index m, even when IPE2 = 0. In this short example, RCX will still contain the value it had at subroutine entry: the address of the only subroutine argument.

00000126(#15,100,29):      MOVSX_Q   RDI,ExtractedExpression@2
0000012b(#15,101,29):      MOVSX_Q   RSI,GLOBVARS!IPE2[RBP]
00000132(#33,45,23):      ALIGN16
00000140(#33,46,23):      N_3:
00000140(#41,47,18):      CMP_Q     RSI,0
00000147(#41,48,18):      JNE       N_6
0000014d(#51,49,19):      Removed instruction
0000014d(#51,50,19):      MOV       M,RDI
00000151(#42,51,19):      JMP       N_7
00000156(#42,52,19):      N_6:
00000156(#62,53,21):      MOVSX_Q   RCX,ExtractedExpression@3
0000015b(#62,54,21):      MOV       M,RCX
0000015f(#42,55,21):      N_7:
...
0000017e(#87,60,24):      MOVSX_Q   RCX,RCX
00000181(#149,61,24):      SUB_Q     RCX,(GLOBVARS!SUP:start:1)[RBP]
...
000001a5(#155,67,24):      MOV_Q     R10,GLOBVARS!SUP[RBP]
000001ac(#155,68,24):      MULSS     XMM7,[R10+4*RCX]

mecej4

Posts: 1911

Back to Top

7 Sep 2019 2:41 #24320

I ran into the posting line limit of the forum, and had to trim my comments in the initial post to avoid having part of the machine code cut out.

The actual code in which this bug was first encountered was a commercial production code of about 15,000 lines that solves the partial differential equations governing geothermal flow and outputs the results using Clearwin graphics. User Jcherw provided a trimmed version with the Clearwin parts removed, leaving about 13,000 lines. The bug had gone unnoticed for months -- there was no access violation, just slightly different results with /opt versus without. One day, sunshine came in the form of a bunch of unexpected 0.000E+00 in the results, which prompted an in-depth study of the code.

The work of reducing the code down to the reproducer while preserving the bug was an interesting experience.

PaulLaidler

Posts: 7972 Salford, UK

Back to Top

7 Sep 2019 7:40 #24321

Many thanks for the freed back and your work in isolating this bug.

I have made a note that it needs to be fixed.

EKruck

Posts: 221 Aalen, Germany

Back to Top

16 Oct 2019 2:03 #24538

Hi Paul, in three subroutines I found severe bugs of different types compiling my 64 bit programs with /opt in FTN95 8.50. Because I cannot test everything should I better go back to 8.40? Erwin

PaulLaidler

Posts: 7972 Salford, UK

Back to Top

17 Oct 2019 7:09 #24539

Erwin

I can't think of a reason why 8.40 would be better.

Please send details of the failures so that they can be fixed.

PaulLaidler

Posts: 7972 Salford, UK

Back to Top

4 Nov 2019 3:32 #24627

The bug reported at the start of this thread has been fixed for the next release of FTN95.

A temporary fixe is to use /inhibit_opt 86.