With the current 8.51 compiler, the following test program works correctly when it is compiled with any combination of options except /64 /opt. With that particular combination, the compiler pulls the value of a temporary expression from an incorrect register; that value is then used as an array index and, as one may expect, causes an access violation.
module globvars
implicit none
save
integer :: nx,nxy,nz,nkua,ipe1,ipe2,kb,kt,ikk,jkk,kkk,ikua,cnt
integer, allocatable :: mlog(:)
real, allocatable :: sup(:), qfflu(:), qwflu(:), mobw(:)
end module
subroutine kuamod(uqwm)
use globvars
implicit none
integer :: m, mm, k
real :: uqwm
uqwm = 0.
do k = kb, kt
if (ipe2==0) then
m = (k-1)*nxy + (jkk-1)*nx + ikk ! m1
else
m = (kkk-1)*nxy + (jkk-1)*nx + k ! m2
end if
mm = (ikua-1)*nz + k
qfflu(mm) = sup(m)*qwflu(mm) ! Error: m = m1 regardless of value of ipe2
uqwm = uqwm + qfflu(mm)
cnt = cnt+1
mlog(cnt) = m
end do
return
end subroutine
program wbug
use globvars
implicit none
integer i, msu, mq
real uqwm
kb = 2; kt = 9; nx = 5; nxy = 9; nz = 7
ikk = 2; jkk = 3; kkk = 4; ikua = 1
ipe1 = 1
msu = (kt-1)*nxy + (jkk-1)*nx + ikk
mq = (ikua-1)*nz + kt
allocate(sup(msu), qwflu(mq), qfflu(mq), mlog(kt-kb+1))
sup = (/ ((0.025*i - 0.004)*i + 0.33, i=1,msu) /)
qwflu = (/ (i*0.125, i=1,mq) /)
cnt = 0; ipe2 = 0; call kuamod(uqwm)
print 10,ipe2,uqwm
print 20,(i,mlog(i),i=1,cnt)
cnt = 0; ipe2 = 1; call kuamod(uqwm)
print 10,ipe2,uqwm
print 20,(i,mlog(i),i=1,cnt)
10 format(' After call with ipe2 = ',i1,', uqwm = ',F5.1)
20 format(1x,i3,i10)
end program
With /opt /64, the generated code pre-calculates the two alternative expressions for the index variable m in the subroutine, and puts the results into temporary variables. Then, it tests ipe2 == 0 and sets m = the correct expression. In each case, the value is also placed in a register, and that register is to be used as the source of the index m in multiplying an XMM register by sup(m). As the following segment of the /exp listing shows, when IPE2 = 0, the value of m will be in RDI and, when IPE2 /= 0, it will be in RCX. Unfortunately, only the latter (RCX) is used as the index m, even when IPE2 = 0. In this short example, RCX will still contain the value it had at subroutine entry: the address of the only subroutine argument.
00000126(#15,100,29): MOVSX_Q RDI,ExtractedExpression@2
0000012b(#15,101,29): MOVSX_Q RSI,GLOBVARS!IPE2[RBP]
00000132(#33,45,23): ALIGN16
00000140(#33,46,23): N_3:
00000140(#41,47,18): CMP_Q RSI,0
00000147(#41,48,18): JNE N_6
0000014d(#51,49,19): Removed instruction
0000014d(#51,50,19): MOV M,RDI
00000151(#42,51,19): JMP N_7
00000156(#42,52,19): N_6:
00000156(#62,53,21): MOVSX_Q RCX,ExtractedExpression@3
0000015b(#62,54,21): MOV M,RCX
0000015f(#42,55,21): N_7:
...
0000017e(#87,60,24): MOVSX_Q RCX,RCX
00000181(#149,61,24): SUB_Q RCX,(GLOBVARS!SUP:start:1)[RBP]
...
000001a5(#155,67,24): MOV_Q R10,GLOBVARS!SUP[RBP]
000001ac(#155,68,24): MULSS XMM7,[R10+4*RCX]