I now have a working demonstration of what I believe is a bug in the bug-checking code that is introduced into an EXE produced by FTN95 with options such as /check, /undef, etc.
SYMPTOMS
One specific symptom is that when the EXE is run it aborts with totally unexpected and inexplicable contexts. Adding a redundant IMPLICIT NONE, adding an otherwise unused variable in the declarations section, including a subprogram that never gets called, etc., changing the compiler options or input data may cause the overflow abort to disappear, only to make the user see the bug resurface when some slight change is made to the program source.
These integer overflows are exhibited even for sources which are error free and run without any problems using other compilers that provide for catching integer overflow (NAG, the old CVF). It is this property -- error stops resulting from running error-free programs -- that differentiates this kind of error from the more common errors such as undefined variables, argument mismatches, array overruns, etc.
The overflow is not related to integer variables in the test program, but apparently originates in subscript calculations and bounds checking, for which the compiler inserts immediate data bytes in the instruction stream or places those bytes in the stack.
BUG IS ELUSIVE
In the past, I tried to post a bug report for this problem, but the test programs were too big, and had bugs that I was trying to hunt down and fix using FTN95. When a test program with suspected bugs aborts with integer overflow, what is the basis for apportioning blame between the compiler and the test program?
REPRODUCER
I have prepared a 640 line program that should demonstrate the problem.
Please download the single source file (in a zip file) from
Compile and link with /check or /64 /check and run the program. It will abort (at least in my experience, using the 8.92 compiler) with integer overflow on line 553:
ddn = dvnorm (n, yh(1, L), ewt) / tesco (1, nq)
(Note that the 64 bit debugger reports 552, being off by one line). In the variable pane, you can see yh listed as 'yh(216, invalid)'.
- Repeat step 2, this time adding /imp as a compiler action. Note that no variables are flagged by the compiler as not being covered by type declarations. This EXE runs to normal completion!
The EXEs differ by one byte in two places. In effect, that one byte difference causes havoc.
Here is my interpretation of what goes wrong at the machine level. At the point where the integer overflow occurs, these are the instructions:
40C28A mov r15, [rsp+0x2f0] inc r15 jno ... mov eax, 0x02 int 0x9 imul r15, rbx #RBX has D8 (= 216) jno ...
The value loaded into R15, should be the first subscript extent of array YH:
MOV_Q R15,(YH:size:1)
If you step to the instruction following this instruction (I noted EIP = 40C28A for the 64 bit program), you find that the value loaded into R15 from the stack is 0x8080808080808080 when the EXE had been compiled with /64 /check, instead of the correct value, namely, 0x0000000000000017, for the extent of the second subscript of YH, minus 1. When this huge value (already so large that it is now a negative number in twos-complement notation) is multiplied by RBX, which contains 0x00000000000000D8 = 216, the extent of the first subscript of YH(:,:), integer overflow results. When /imp is used in addition to /check, the correct value gets loaded and the whole problem goes away.
The same sort of things happen with 32-bit EXEs, with the culminating instruction:
405196 IMUL EDI,EDX
APOLOGY: This is rather long and technical, and is not meant for general reading. The reported details should help in fixing the problem. Some of my guesswork regarding the compiler internal workings may certainly be wrong.