Topic: Crash in library function (cannot duplicate) in 64-bit

wahorger

Posts: 1273 Morrison, CO, USA

Back to Top

21 Jan 2021 11:02 #26951

I am getting a crash in the library function LENG8$. I cannot duplicate in a small example.

It would appear that the function finds the trimmed length of a character string, returning the new trimmed length. In the routine in question it is used successfully over a dozen times in my assembly listing before failing here.

The assembler code (my code) is:

j = len_trim(feet_or_meters_scale(i_logmtr))

00003a1a(#20238,2524,693): NOP 00003a1b(#20238,2524,693): MOVSX_Q RBX,I_LOGMTR 00003a22(#35595,2525,693): IMUL_Q RBX,16_4 00003a2a(#20244,2526,693): LEA RCX,FEET_OR_METERS_SCALE:d[RBP+RBX] 00003a32(#20244,2527,693): MOV_Q RDX,16_4 00003a39(#20244,2528,693): CALL LENG8$ 00003a3e(#20226,2529,693): MOV J,RAX

The data declarations are:

00003a32(#20244,2527,693): MOV_Q RDX,16_4 00003a39(#20244,2528,693): CALL LENG8$ 00003a3e(#20226,2529,693): MOV J,RAX

The data declarations are:

00003a32(#20244,2527,693): MOV_Q RDX,16_4 00003a39(#20244,2528,693): CALL LENG8$ 00003a3e(#20226,2529,693): MOV J,RAX

The data declarations are:

00003a32(#20244,2527,693): MOV_Q RDX,16_4 00003a39(#20244,2528,693): CALL LENG8$ 00003a3e(#20226,2529,693): MOV J,RAX

The data declarations are: [quote:a81c3384c9] character16:: feet_or_meters_scale(0:2) data feet_or_meters_scale/'UNDEFINED','feet/inch','meters/2.54 cm'/ integer:: i_logmtr real8:: log_scale

The error is: [quote:a81c3384c9] Silverfrost 64-bit exception report on F:\cmasterf95\RELEASE\win64\C-MASTER.exe Thu Jan 21 15:50:24 2021

Access violation (c0000005) at address 7ff9bba20ae3

Within file CLEARWIN64.DLL In LENG8$ at address 13 Within file C-MASTER.exe in LOGPLT_WINDOWS in line 693, at address 3a3e in LOGPLOT_WINDOWS in line 26, at address 143 Within file CLEARWIN64.DLL In _set_mg_return_value at address 6B72 In CallWindowProcW at address 3BD Within file USER32.DLL In DispatchMessageW at address 1F2 In IsDialogMessageW at address 280 In IsDialogMessage at address 7C In _register_message_interception at address 63E Within file CLEARWIN64.DLL In _yield_program_control at address 15A

RAX = 00000000ffffffff RBX = 0000000202020200 RCX = 00000002058935ff RDX = 0000000000000010 RBP = 0000000000400000 RSI = 000000000a626720 RDI = 0000000000000004 RSP = 000000000de99c58 R8 = 0000000013fd0e40 R9 = 0000000000000001 R10 = 0000000000008000 R11 = 000000000de99840 R12 = 000000000387360c R13 = 00000000038734c0 R14 = 00000000038734c4 R15 = 0000000003874da8

7ff9bba20ae3) movzx_b_q RAX,[RCX]

PaulLaidler

Posts: 7972 Salford, UK

Back to Top

22 Jan 2021 8:21 #26952

Is there a check in the code to ensure that i_logmtr is in the range 0 to 2?

mecej4

Posts: 1911

Back to Top

22 Jan 2021 10:16 (Edited: 23 Jan 2021 9:38) #26954

Given the gaps in the information available, the following is definitely speculative.

I do not know which version of the CLEARWIN64 DLL Bill is using; so I will take the information from the one that I have with FTN95 8.70:

 22.11.17.16, date 11/17/2020

Here are the relevant lines of machine code, starting with the pieces in Bill's traceback, and the code of LENG8$ in the DLL (disassembly generated using Microsoft's DUMPBIN):

00003a1b(#20238,2524,693): MOVSX_Q RBX,I_LOGMTR
00003a22(#35595,2525,693): IMUL_Q RBX,16_4
00003a2a(#20244,2526,693): LEA RCX,FEET_OR_METERS_SCALE:d[RBP+RBX]
00003a32(#20244,2527,693): MOV_Q RDX,16_4
00003a39(#20244,2528,693): CALL LENG8$
00003a3e(#20226,2529,693): MOV J,RAX
...
LENG8$: 
  00000001800CF090: 48 83 FA 01        cmp         rdx,1
  00000001800CF094: 7D 03              jge         00000001800CF099
  00000001800CF096: 33 C0              xor         eax,eax
  00000001800CF098: C3                 ret
  00000001800CF099: 48 8D 4C 11 FF     lea         rcx,[rcx+rdx-1]
  00000001800CF09E: 48 85 D2           test        rdx,rdx
  00000001800CF0A1: 7E 12              jle         00000001800CF0B5
  00000001800CF0A3: 0F B6 01           movzx       eax,byte ptr [rcx]  # CRASH LOCATION
  00000001800CF0A6: 48 FF C9           dec         rcx
  00000001800CF0A9: 3C 20              cmp         al,20h
  00000001800CF0AB: 75 08              jne         00000001800CF0B5
  00000001800CF0AD: 48 FF CA           dec         rdx
  00000001800CF0B0: 48 85 D2           test        rdx,rdx
  00000001800CF0B3: 7F EE              jg          00000001800CF0A3
  00000001800CF0B5: 48 8B C2           mov         rax,rdx
  00000001800CF0B8: C3                 ret

The value of register RBX (0000000202020200) strikes me as suspicious.

At the crash location, LENG8$+0x13, we can deduce from the preceding instructions that register RBX has the same contents as had been set in the 'IMUL_Q RBX,16_4' instruction in Bill's caller. We can thus infer that the I_LOGMTR was equal to 0000000020202020, which is not at all reasonable, as Paul has already pointed out -- it should be in the range 0 to 2.

I suspect that something such as an array overrun caused a character string containing blanks (byte 0x20 repeated) to be written into the integer variable I_LOGMTR.

A separate point, for Paul's consideration: the machine instructions given by DUMPBIN and FTN95's traceback for the crash location disagree:

Dumpbin gives: 00000001800CF0A3: 0F B6 01 movzx eax,byte ptr [rcx]

whereas FTN95 gave: 7ff9bba20ae3) movzx_b_q RAX,[RCX]

The distinction could be of no significance unless a subsequent instruction used the upper (nameless) half of RAX, but I should prefer to see the correct instruction disassembled.

wahorger

Posts: 1273 Morrison, CO, USA

Back to Top

22 Jan 2021 6:28 #26956

Paul, yes, there is. An 'IF' earlier sets it to 1 or 2. This is the only usage for i_logmtr, locally defined. I should probably turn on /BOUNDS_CHECK just in case, but this is in previously working, untouched code. In 32-bit mode, I do have /BOUNDS_CHECK and it does not error out.

wahorger

Posts: 1273 Morrison, CO, USA

Back to Top

22 Jan 2021 6:34 #26957

Paul, yes, there is. An 'IF' earlier sets it to 1 or 2. This is the only usage for i_logmtr, locally defined. I should probably turn on /BOUNDS_CHECK just in case, but this is in previously working, untouched code. In 32-bit mode, I do have /BOUNDS_CHECK and it does not cause an error to be detected.

mejec4, the address is within a DLL, loaded at some weird address. Unlike the MAP for 32-bit, the eventual loading address is not given in SLINK64. Your example may be perfectly correct, just not the same as mine because our code and DLL references are different.

The code I show is my code that leads to the LENG8$ call which has the error. I have no idea what is before/after the 'offending' instruction.

PaulLaidler

Posts: 7972 Salford, UK

Back to Top

22 Jan 2021 6:47 #26958

Another possibility is that something is over-writing the character array.

Try printing out the array just before the failure occurs.

wahorger

Posts: 1273 Morrison, CO, USA

Back to Top

22 Jan 2021 7:20 #26959

I had thought that /zeroise would zero memory, but no. There was an escapement in a secondary variable was not zero on entry, causing the i_logmtr to NOT be initialized appropriately. On to the next error.

Learned something new, and will apply caution in the future when similar problems arise!

Thanks, mecej4 and Paul.

wahorger

Posts: 1273 Morrison, CO, USA

Back to Top

22 Jan 2021 10:54 #26961

So, cautiously, I approached the debugging of the module and found an interesting (unexpected) result: That is - every local variable I looked at contained the same pattern of (hex) 20202020. Specifically, I had initialized each of these variables with a DATA statement, or a specific assignment when the variable was declared (i.e. INTEGER:: ABCD=0) I printed these locals as the module was entered before any processing took place.

In addition, a ganged pair of radio buttons is (in 64-bit) no longer paired. They operate as separate buttons. Also, they initialized to (hex) 20202020 before they are used.

A local variable that is set via assignment statement later on in the module to 0.0 on entry is displayed as 6.0135E-154 (also= (hex) 2020202020202020).

Compiled in 32-bit mode as /CHECKMATE and as /RELEASE with /BOUNDS_CHECK turned on, none of these problems occur.

I am not able to duplicate none of these (the filling of memory with 20202020, nor the ganged radio button, nor the floating point number) in a small sample program.

Some guidance to find what is causing this would be helpful. DBG64?

mecej4

Posts: 1911

Back to Top

22 Jan 2021 11:41 #26962

Do you have one or more character variables declared with len = a large number, inadvertently set equal to blank?

Does your code pass such a variable to a third party library routine which sets the variable to blank?

When you first see a variable with its bytes set to 0x20, is that the first time that execution of the current subprogram occurs, or otherwise?

wahorger

Posts: 1273 Morrison, CO, USA

Back to Top

23 Jan 2021 1:18 #26963

Within this routine nor in its caller, no big character variables being initialized.

The real*8 is getting changed back on the second time through, but I think the constant 0.0 is getting hammered, so cannot affect that.

The radio button values are not getting hammered again. But the ganging is still not working.

mecej4

Posts: 1911

Back to Top

23 Jan 2021 9:35 #26964

Bill, if you just tell us the version number of your CLEARWIN64.DLL, and that version was not a private version, we could easily settle any doubts regarding the library function LENG8$, which is called to find len_trim(char var).

I expect the machine code of that function to be identical in your DLL version to the one that I showed. The function is only 16 instructions long, takes two arguments in RCX and RDX, and returns the result in RAX. It does not change any of the other registers. Nothing can go wrong there.

Therefore, the errors must occur earlier in the call chain, in portions of your code that we know nothing about. With such tough errors, I would not completely trust a symbolic debugger.

You mentioned 'radio buttons not getting hammered'. I hope that you appreciate that the phrase has no meaning (to me, anyway) within the context of the information that you have disclosed in this thread.

wahorger

Posts: 1273 Morrison, CO, USA

Back to Top

23 Jan 2021 2:05 #26965

mejec4, thanks for the response.

Yesterday, my post spoke of memory being filled with spaces, filled over data for which I had data statements. I uncovered also that included the floating point constant of 0.0. So the code in leng8$ is no longer in question. Since it loaded a non-valid number for the index to the array, having it crash was not unexpected. The variable was not set properly because the floating point constant of 0.0 was no longer exactly zero, this prevented a portion of my initialization code from running, leaving a key variable (i_logmtr) set to all spaces (or 538976288). Even though it had been also been initialized to zero in a data statement. The question remains of 'Why? Why is memory that holds my variables AND constants getting filled with spaces?'. The suggestions to look at initialization statements possibly 'gone awry' was good. Big (wide) character arrays getting initialized (there are none). I don't use the char_fill@() function in this particular program at all, relying on program assignment statements to initialize bulk arrays. So where to look for the program going 'rouge'.

I first took out of the build any /ZEROISE, just in case that was the culprit. This had no effect.

And, yesterday, I put in a little PRINT statement on a couple of variables to show the state of the local variables on initial entry and immediately before first use. The print showed that these locals had been set to spaces. Integers that should be zero or one (from the DATA statements) are displaying as 538976288. REAL8's are displayed as 6.0135E-154. These real8's had been initialized by a program statement, setting the value explicitly to 0.0d0.

This morning (23 Jan), I placed in the module a small bit of code. The code would allow me to call this routine BEFORE any of my other code ran, just in case I have something (programmatically) that went bonkers. It can only be run once, it prints, and it immediately returns. Stated another way, before any of my 'standard' code that initializes any other variables, opens files, clears character arrays; before any of that executes, I call the routine and print the values of a few variables that should have pre-defined values based on their data statements.

In a perfect world, the numbers that are displayed should match their initialization (data) statement values. That was not what is observed: the values are not what was expected; their values are set to all spaces and this is reflected in the numeric values displayed by the print.

I hope that clears up any confusion that I may have left yesterday.

And, the confusion about why this is happening is still there.

mecej4

Posts: 1911

Back to Top

23 Jan 2021 2:13 #26966

Thanks for the details. The bugs are still present, then?

wahorger

Posts: 1273 Morrison, CO, USA

Back to Top

23 Jan 2021 3:10 #26967

Oh, yeah. It is still there. I thought it might be a BLOCK DATA issue, so I went through all my INS files looking for problems. Can't find any, and not getting any compiler errors about too few or too many initializers. So, flummoxed.

Too bad I cannot come up with a smaller example for Paul, et. al. Since 64-bit was a goal for me, not a requirement, this and a few other stumbling blocks have now slowed my adoption.

As an aside, I have been able to find and fix inconsistencies in my code for 64-bit, namely winio@() arguments that were declared as INTEGER(7) incorrectly, or as INTEGER(3) and should have been INTEGER(7).

Even with this particular problem, I was able to make some progress to that ultimate end. I also learned a great deal about DLL's and third party compilers in the process. So, all in all, the time I spent was productive, albeit not to the exact end I had originally envisioned.

Robert

Posts: 450 Manchester

Back to Top

23 Jan 2021 3:57 #26968

Quoted from mecej4

The value of register RBX (0000000202020200) strikes me as suspicious.

At the crash location, LENG8$+0x13, we can deduce from the preceding instructions that register RBX has the same contents as had been set in the 'IMUL_Q RBX,16_4' instruction in Bill's caller. We can thus infer that the I_LOGMTR was equal to 0000000020202020, which is not at all reasonable, as Paul has already pointed out -- it should be in the range 0 to 2.

I suspect that something such as an array overrun caused a character string containing blanks (byte 0x20 repeated) to be written into the integer variable I_LOGMTR.

It isn't byte 0x20 repeated, it is 0x02 repeated.

Quoted from mecej4

A separate point, for Paul's consideration: the machine instructions given by DUMPBIN and FTN95's traceback for the crash location disagree:

Dumpbin gives: 00000001800CF0A3: 0F B6 01 movzx eax,byte ptr [rcx]

whereas FTN95 gave: 7ff9bba20ae3) movzx_b_q RAX,[RCX]

The distinction could be of no significance unless a subsequent instruction used the upper (nameless) half of RAX, but I should prefer to see the correct instruction disassembled.

I suspect they decode to the same thing, moving a byte from [RCX] to al, with the remaining length of string in RDX.

wahorger

Posts: 1273 Morrison, CO, USA

Back to Top

23 Jan 2021 4:07 #26969

Robert, yes, this is a side effect of memory being filled with spaces, incorrectly, wiping out local variables, and constants that would nominally be correct for error-free operation. The reason for this is mystifying especially since I did a test where none of my code was run, yet local variables did not have their set values (via a DATA statement).

I am no longer concerned about LENG8$, it is operating correctly if given correct inputs. In this case, it was not.

mecej4

Posts: 1911

Back to Top

23 Jan 2021 4:08 #26970

Robert wrote: 'It isn't byte 0x20 repeated, it is 0x02 repeated'.

The index variable I_LOGMTR pertains to character*16 variables. The instructions

 MOVSX_Q RBX,I_LOGMTR 
 IMUL_Q RBX,16_4

cause RBX to contain I_LOGMTR shifted left by one nybble, before the call to LENG8$, and the function does not touch RBX.

Robert

Posts: 450 Manchester

Back to Top

23 Jan 2021 4:15 #26971

Ah, I see.

StamK

Posts: 171

Back to Top

26 Jan 2022 8:23 #28692

So what is the conclusion from this thread? Is it an ftn95 bug (zeroing not happening properly) or is it a user error? I am asking because I occasionally get the same error (cannot easily repeat it).

mecej4

Posts: 1911

Back to Top

26 Jan 2022 11:10 #28693

My assessment is that there could be no conclusion, since no reproducer was provided. From the code fragments that were shown, speculations were made.

A reexamination allows me to suspect that it was an error in the user code. Here is a short test program that may help you if your code has the same error as in the test program.

program leng8dolbug
implicit none
character*16:: feet_or_meters_scale(0:2)
data feet_or_meters_scale/'UNDEFINED','feet/inch','meters/2.54 cm'/
call prlen(feet_or_meters_scale)
end program

subroutine prlen(str)
implicit none
character*(*), intent(in), dimension(*) :: str
integer :: i
do i=0,2
   print '(i2,2x,A,2x,i6)',i,trim(str(i)),len_trim(str(i))
end do
return
end subroutine

The output from FTN95 8.83 with /64:

 0  &#966;$@                   16
 1  UNDEFINED       9
 2  feet/inch       9

The error is that the lower bound of an array of any type is, by default, 1. If a user declares an array with a different lower bound (0 in the example code) and then passes that array to a subroutine that expects an assumed size array, the lower bound in the subroutine will be 1, not 0 as the programmer may have expected. In other words, a non-default lower bound is not provided to the subroutine by the caller.

The same program, compiled for 32-bit, aborts with an access violation.

I wonder if the OP of this thread (WAHorger) inserted the 'Undefined' value into the array in an attempt to catch exactly the same error -- default LBOUND of dummy array argument not equal to actual LBOUND of corresponding actual array argument.