Forum Index
Welcome to the Silverfrost forums
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Optimisation bug

Post new topic   Reply to topic Forum Index -> 64-bit
View previous topic :: View next topic  
Author Message

Joined: 31 Oct 2006
Posts: 1186

PostPosted: Sat Sep 07, 2019 3:18 am    Post subject: Optimisation bug Reply with quote

With the current 8.51 compiler, the following test program works correctly when it is compiled with any combination of options except /64 /opt. With that particular combination, the compiler pulls the value of a temporary expression from an incorrect register; that value is then used as an array index and, as one may expect, causes an access violation.

module globvars
   implicit none
   integer :: nx,nxy,nz,nkua,ipe1,ipe2,kb,kt,ikk,jkk,kkk,ikua,cnt
   integer, allocatable :: mlog(:)
   real, allocatable :: sup(:), qfflu(:), qwflu(:), mobw(:)
end module

subroutine kuamod(uqwm)
   use globvars
   implicit none
   integer :: m, mm, k
   real    :: uqwm

   uqwm = 0.
   do k = kb, kt
      if (ipe2==0) then
         m = (k-1)*nxy   + (jkk-1)*nx + ikk  ! m1
         m = (kkk-1)*nxy + (jkk-1)*nx + k    ! m2
      end if
      mm = (ikua-1)*nz + k
      qfflu(mm) = sup(m)*qwflu(mm)    ! Error: m = m1 regardless of value of ipe2
      uqwm = uqwm + qfflu(mm)
      cnt = cnt+1
      mlog(cnt) = m
   end do
end subroutine

program wbug
   use globvars
   implicit none
   integer i, msu, mq
   real uqwm
   kb   = 2;  kt = 9;  nx = 5;  nxy = 9; nz = 7
   ikk  = 2; jkk = 3; kkk = 4; ikua = 1
   ipe1 = 1
   msu  = (kt-1)*nxy + (jkk-1)*nx + ikk
   mq   = (ikua-1)*nz + kt
   allocate(sup(msu), qwflu(mq), qfflu(mq), mlog(kt-kb+1))
   sup =   (/ ((0.025*i - 0.004)*i + 0.33, i=1,msu) /)
   qwflu = (/ (i*0.125, i=1,mq) /)
   cnt = 0; ipe2 = 0; call kuamod(uqwm)
   print 10,ipe2,uqwm
   print 20,(i,mlog(i),i=1,cnt)

   cnt = 0; ipe2 = 1; call kuamod(uqwm)
   print 10,ipe2,uqwm
   print 20,(i,mlog(i),i=1,cnt)
   10 format(' After call with ipe2 = ',i1,', uqwm = ',F5.1)
   20 format(1x,i3,i10)
   end program

With /opt /64, the generated code pre-calculates the two alternative expressions for the index variable m in the subroutine, and puts the results into temporary variables. Then, it tests ipe2 == 0 and sets m = the correct expression. In each case, the value is also placed in a register, and that register is to be used as the source of the index m in multiplying an XMM register by sup(m). As the following segment of the /exp listing shows, when IPE2 = 0, the value of m will be in RDI and, when IPE2 /= 0, it will be in RCX. Unfortunately, only the latter (RCX) is used as the index m, even when IPE2 = 0. In this short example, RCX will still contain the value it had at subroutine entry: the address of the only subroutine argument.

00000126(#15,100,29):      MOVSX_Q   RDI,ExtractedExpression@2
0000012b(#15,101,29):      MOVSX_Q   RSI,GLOBVARS!IPE2[RBP]
00000132(#33,45,23):      ALIGN16
00000140(#33,46,23):      N_3:
00000140(#41,47,18):      CMP_Q     RSI,0
00000147(#41,48,18):      JNE       N_6
0000014d(#51,49,19):      Removed instruction
0000014d(#51,50,19):      MOV       M,RDI
00000151(#42,51,19):      JMP       N_7
00000156(#42,52,19):      N_6:
00000156(#62,53,21):      MOVSX_Q   RCX,ExtractedExpression@3
0000015b(#62,54,21):      MOV       M,RCX
0000015f(#42,55,21):      N_7:
0000017e(#87,60,24):      MOVSX_Q   RCX,RCX
00000181(#149,61,24):      SUB_Q     RCX,(GLOBVARS!SUP:start:1)[RBP]
000001a5(#155,67,24):      MOV_Q     R10,GLOBVARS!SUP[RBP]
000001ac(#155,68,24):      MULSS     XMM7,[R10+4*RCX]

Last edited by mecej4 on Sat Sep 07, 2019 3:42 am; edited 1 time in total
Back to top
View user's profile Send private message

Joined: 31 Oct 2006
Posts: 1186

PostPosted: Sat Sep 07, 2019 3:41 am    Post subject: Reply with quote

I ran into the posting line limit of the forum, and had to trim my comments in the initial post to avoid having part of the machine code cut out.

The actual code in which this bug was first encountered was a commercial production code of about 15,000 lines that solves the partial differential equations governing geothermal flow and outputs the results using Clearwin graphics. User Jcherw provided a trimmed version with the Clearwin parts removed, leaving about 13,000 lines. The bug had gone unnoticed for months -- there was no access violation, just slightly different results with /opt versus without. One day, sunshine came in the form of a bunch of unexpected 0.000E+00 in the results, which prompted an in-depth study of the code.

The work of reducing the code down to the reproducer while preserving the bug was an interesting experience.
Back to top
View user's profile Send private message
Site Admin

Joined: 21 Feb 2005
Posts: 6004
Location: Salford, UK

PostPosted: Sat Sep 07, 2019 8:40 am    Post subject: Reply with quote

Many thanks for the freed back and your work in isolating this bug.

I have made a note that it needs to be fixed.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic Forum Index -> 64-bit All times are GMT + 1 Hour
Page 1 of 1

Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Powered by phpBB © 2001, 2005 phpBB Group