Topic: Crash in library function (cannot duplicate) in 64-bit

wahorger

Posts: 1273 Morrison, CO, USA

Back to Top

26 Jan 2022 2:03 #28694

mecej4,

I read your post, then re-read the thread. One thing you missed in your assumption of an error in my code: This is a local variable (0:2) and other local variables have been initialized/modified to be (hex) 2020... sequences. Even floating point constants. Bizarre. And the 32-bit version of the code runs just fine.

StamK indicates that this also happens to him occasionally and he cannot easily repeat it either.

mecej4

Posts: 1911

Back to Top

26 Jan 2022 2:52 (Edited: 26 Jan 2022 7:43) #28695

Bill,

You did say that i_logmtr was local and had values in the range 0..2; later, you wrote 'That is - every local variable I looked at contained the same pattern of (hex) 20202020', but I did not know if that applied to the feet_or_meters_scale array and if that array was local.

This new errror (all variables set to values of Z'20202020') is, by itself, worthy of analysis. Can you provide a reliable reproducer for this error? That would help.

Noting that your OP was made a year ago, I looked at the code of LEN8$ in the most recent version of CLEARWIN64.DLL, and found that it was the same as in the older version (which was listed in an earlier post in this thread).

PaulLaidler

Posts: 7974 Salford, UK

Back to Top

26 Jan 2022 4:58 #28701

I have not followed the discussion on this thread but LENG8$ is the same as LEN_TRIM except that it returns an INTEGER*8 value.

So it would only be needed for CHARACTR variables with length LEN(str) greater than 2,147,483,647 (very long). It scans from the end looking for the first character that is not a space (0x20).

I would need to check the explist to see if FTN95 is sending it an INTEGER*8 value as the input for its LEN.

PaulLaidler

Posts: 7974 Salford, UK

Back to Top

27 Jan 2022 9:52 #28703

This is a correction to my last post.

FTN95 implements LEN_TRIM(string, kind=k) where k defaults to 3. For 64 bits, regardless of k, it calls LENG8$ and passes a 64 bit value for LEN(str). LENG8$ returns a 64 bit value that is moved (via MOV, MOV_Q etc.) to the 'destination'. For example, a simple assignment to an INTEGER4 will use a 32 bit MOV whilst an assignment to an INTEGER8 will use a 64 bit MOV_Q.

It is possible that this is not the best strategy and a review may be required. In particular, there may be problems when LEN_TRIM is called within a function or subroutine argument list.

If this turns out to be the case then a temporary fix is to remove the call to LEN_TRIM from the argument list and to provide a simple assignment (to a temporary INTEGER of the required kind) before the call.

mecej4

Posts: 1911

Back to Top

27 Jan 2022 6:41 (Edited: 28 Jan 2022 8:38) #28708

Here is a test program that exhibits many of the malfunctions encountered in 64-bit runs by Wahorger, as described earlier in this thread.

program wahbug
implicit none
character*16:: fms(2)
data fms/'feet/inch','meters/2.54 cm'/
integer:: i_logmtr, ls
integer*8 :: l8
!
i_logmtr = 1
ls = 3
fms(ls) = ' ' !deliberately exceed upper bound of array
l8 = len_trim(fms(i_logmtr))
print *,l8
end program

Compile with /64 /debug and run. It crashes with an access violation at the instruction

movzx       eax,byte ptr [rcx]

inside the function LENG8$. What happens is that the integer constant 16, which is the length parameter of the variable FMS, is overwritten by 8 blank characters, changing Z'0000000000000010' to Z'2020202020202020'. This overwriting happens prior to the call to LENG8$. As a result, the string length hidden argument passed to LENG8$ is that huge integer instead of 16.

If something similar happens in Wahorger's large application compiled in 64-bit mode, using /check or /checkmate should help locate the place where the bug arises.

PaulLaidler

Posts: 7974 Salford, UK

Back to Top

28 Jan 2022 8:20 #28710

Thanks. I will take a look at this.

mecej4

Posts: 1911

Back to Top

28 Jan 2022 8:51 #28711

Two properties of the SDBG64 debugger that I noticed as being handicaps while probing this issue at the assembler level:

A bug in the disassembler, as reported at https://forums.silverfrost.com/Forum/Topic/4109 .
The string length parameter, 16, is a constant stored in the data segment. It may be represented by an address such as '[50b0]$' in the disassembly and /EXP listings, and it took several disassembly runs to associate 50b0 with the character length. For instance, the EXP listing shows

IMUL_Q RBX,16_4

but in the disassembly window SDBG64 may show

IMUL_Q RBX, [50b0]$

(I think that the '$' indicates that the address is to be added to the contents of register RBP). Once the address is identified and its association with the string length constant recognised, we face another obstacle, which is that there is no way (?) in SDBG64 to display the contents of that memory location in a watch-variable window.

Robert

Posts: 450 Manchester

Back to Top

28 Jan 2022 1:31 #28714

You are correct about the $. RBP is loaded with the DLL/EXE's load address and then data is referenced from it.

PaulLaidler

Posts: 7974 Salford, UK

Back to Top

29 Jan 2022 8:43 #28720

mecej4

Thank you for your feedback. I can now confirm your analysis.

At the moment I can't think of a way to avoid this failure (apart from correcting the Fortran source code) nor how to provide a meaningful error report when the source code contains this bug.

PaulLaidler

Posts: 7974 Salford, UK

Back to Top

29 Jan 2022 9:11 #28722

mecej4

I have changed LENG8$ so that your sample code will give a runtime failure 'Memory corrupted before call to LEN_TRIM'.

This is not very helpful but better than nothing.

mecej4

Posts: 1911

Back to Top

29 Jan 2022 3:11 (Edited: 29 Jan 2022 3:53) #28727

Paul, thanks for confirming my analysis. I did a similar analysis of the 32-bit EXE for the same test program, and found that it behaves a bit differently. Instead of calling a library function such as LENG8$, it calculates the trimmed length of the string in line. What gets clobbered in the 32-bit EXE is not the string length parameter (the constant 16), but the constant 2, which is later used in a call to WSF1 (Fortran I/O routine ?), and other variables that follow that 2 in memory.

I have seen other compilers put constants such as the 16 or the 2 that we saw in this program into a read-only data segment such as .rdata. This approach would cause an access violation earlier in the code and that would make it easier to catch the bug in user code.

As you said, catching the bug in LENG8$ is better than nothing, but users tend to conclude from the stack trace that the bug is in the library itself and may expect that it is for the compiler/library developers to fix it.

mecej4

Posts: 1911

Back to Top

29 Jan 2022 3:49 #28728

Some details are given below about how to probe this error, which may help in debugging Wahorger's program, and other large programs that cannot be compiled with /check or /undef for some good reason, and memory corruption is suspected.

In an attempt to see how the source code could be modified to avoid subscript errors from clobbering constants, I changed the test program as follows:

! http://forums.silverfrost.com/viewtopic.php?p=32570#32570
program wahbug
implicit none
integer, parameter :: SLEN=16   ! gets clobbered by writing to fms(3)
character(len=SLEN) :: fms(2)
data fms/'feet/inch','meters/2.54 cm'/
integer:: i_logmtr, ls
integer*8 :: l8
!
i_logmtr = 2
ls = 3
fms(ls) = ' ' !deliberately exceed upper bound of array
l8 = len_trim(fms(i_logmtr))
print *,l8
end program

Compile with /64 /full_debug and link. Running the program gives an access violation inside LENG8$. Running the program within SDBG64 causes 'access violation reading address FFFFFFFFFFFFFFFF' inside LENG8$. Neither message helps in understanding what caused the access violation.

To understand what happened, restart the program in SDBG64, set a breakpoint on line-13, 'l8 = len_trim(fms(i_logmtr))', and run up to the breakpoint. Press F11. Open the registers window.

Step forward two machine instructions, stopping at 'imul_q R15,...'

Observe that R15 contains the value 1.

Step forward one instruction. But for the bug, we should have seen R15 change to Z'00000000000000010', i.e,. the string length, 16. However, that 'constant' has been clobbered, and we see Z'2020202020202020'. Observing this is a good hint that the string with 16 blanks has been written over the (supposedly constant) parameter SLEN.