A mysterious bug appeared in a large production Fortran code (12 K lines of code). When compiled and run with /check or /debug with /64, the program ran fine. When compiled with just /64 and with or without /debug, it ran fine again. When compiled with /opt /64, the program crashed with an access violation.
Fortunately, after the crash the address of the crash was present in the pop up. Rerunning the FTN95-compiled EXE under the Visual Studio debugger revealed that improper CMP instructions had been generated.
Generally, these are the circumstances when the bug occurs: an expression involving 4-byte integers is (1) evaluated, (2) compared to zero, and (3) the result of the comparison is used as a condition for whether or not to evaluate another expression and assign the result to a variable. However, I am not sure at this point that this bug occurs only if /opt is used. We would have to examine how the compiler treats other large codes to settle that question.
For example, for the source line
if (m1 - incj2 .gt. 0 .and. m1-incj2 .le. nxyz) then
the generated instructions were
000000000040182F: 4D 8B E9 mov r13,r9 ; R9 contains M1 already
0000000000401832: 44 2B AD CC 50 00 00 sub r13d,dword ptr [rbp+00000000000050CCh] ; subtract INCJ2
0000000000401839: 49 81 FD 00 00 00 000 cmp r13,0 ; should have been r13d ; is M1 - INCJ2 > 0 ?
At this point, R13 contained the value 0000 0000 FFFF FFFC, which is a positive 64-bit signed integer. However, R13D, which should have been used in the CMP instruction (or R13D sign-extended to R13 with MOVSX), contains FFFF FFFC, which is a negative 32-bit signed integer. This negative integer is then used as an index into an array, and is likely to result in an access violation and abort. Or, worse, the program may output incorrect results that may not be obviously wrong.
I was able to create a reproducer (see below). The results from compiling and running with /64 /opt:
i1 m1-incj2 xxn(i1)
1 2 -1 4.0000E+00
2 3 4 6.0000E+00
3 2 2 4.0000E+00
4 3 7 -4.3000E+01
The '-1' in the first line of results is wrong. In fact, that whole line should be absent, as running with /64 (i.e., without /opt) shows:
i1 m1-incj2 xxn(i1)
1 3 4 6.0000E+00
2 2 2 4.0000E+00
3 3 7 -4.3000E+01
The bug does not occur if, instead of
if (m1 - incj2 .gt. 0 .and. m1-incj2 .le. nxyz) then
we have
if (m1 .gt. incj2 .and. m1-incj2 .le. nxyz) then
Because of the page size limit of this forum, I have posted the source code of the program in the next post in this thread