forums.silverfrost.com Forum Index forums.silverfrost.com
Welcome to the Silverfrost forums
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Crash in library function (cannot duplicate)
Goto page 1, 2, 3  Next
 
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> 64-bit
View previous topic :: View next topic  
Author Message
wahorger



Joined: 13 Oct 2014
Posts: 1214
Location: Morrison, CO, USA

PostPosted: Fri Jan 22, 2021 12:02 am    Post subject: Crash in library function (cannot duplicate) Reply with quote

I am getting a crash in the library function LENG8$. I cannot duplicate in a small example.

It would appear that the function finds the trimmed length of a character string, returning the new trimmed length. In the routine in question it is used successfully over a dozen times in my assembly listing before failing here.

The assembler code (my code) is:
Quote:


j = len_trim(feet_or_meters_scale(i_logmtr))

00003a1a(#20238,2524,693): NOP
00003a1b(#20238,2524,693): MOVSX_Q RBX,I_LOGMTR
00003a22(#35595,2525,693): IMUL_Q RBX,16_4
00003a2a(#20244,2526,693): LEA RCX,FEET_OR_METERS_SCALE:d[RBP+RBX]
00003a32(#20244,2527,693): MOV_Q RDX,16_4
00003a39(#20244,2528,693): CALL LENG8$
00003a3e(#20226,2529,693): MOV J,RAX


The data declarations are:
Quote:
character*16:: feet_or_meters_scale(0:2)
data feet_or_meters_scale/'UNDEFINED','feet/inch','meters/2.54 cm'/
integer:: i_logmtr
real*8:: log_scale


The error is:
Quote:

Silverfrost 64-bit exception report on F:\cmasterf95\RELEASE\win64\C-MASTER.exe Thu Jan 21 15:50:24 2021


Access violation (c0000005) at address 7ff9bba20ae3

Within file CLEARWIN64.DLL
In LENG8$ at address 13
Within file C-MASTER.exe
in LOGPLT_WINDOWS in line 693, at address 3a3e
in LOGPLOT_WINDOWS in line 26, at address 143
Within file CLEARWIN64.DLL
In _set_mg_return_value at address 6B72
In CallWindowProcW at address 3BD
Within file USER32.DLL
In DispatchMessageW at address 1F2
In IsDialogMessageW at address 280
In IsDialogMessage at address 7C
In _register_message_interception at address 63E
Within file CLEARWIN64.DLL
In _yield_program_control at address 15A


RAX = 00000000ffffffff RBX = 0000000202020200 RCX = 00000002058935ff RDX = 0000000000000010
RBP = 0000000000400000 RSI = 000000000a626720 RDI = 0000000000000004 RSP = 000000000de99c58
R8 = 0000000013fd0e40 R9 = 0000000000000001 R10 = 0000000000008000 R11 = 000000000de99840
R12 = 000000000387360c R13 = 00000000038734c0 R14 = 00000000038734c4 R15 = 0000000003874da8

7ff9bba20ae3) movzx_b_q RAX,[RCX]

Back to top
View user's profile Send private message Visit poster's website
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 7912
Location: Salford, UK

PostPosted: Fri Jan 22, 2021 9:21 am    Post subject: Reply with quote

Is there a check in the code to ensure that i_logmtr is in the range 0 to 2?
Back to top
View user's profile Send private message AIM Address
mecej4



Joined: 31 Oct 2006
Posts: 1884

PostPosted: Fri Jan 22, 2021 11:16 am    Post subject: Reply with quote

Given the gaps in the information available, the following is definitely speculative.

I do not know which version of the CLEARWIN64 DLL Bill is using; so I will take the information from the one that I have with FTN95 8.70:

22.11.17.16, date 11/17/2020

Here are the relevant lines of machine code, starting with the pieces in Bill's traceback, and the code of LENG8$ in the DLL (disassembly generated using Microsoft's DUMPBIN):

Code:
00003a1b(#20238,2524,693): MOVSX_Q RBX,I_LOGMTR
00003a22(#35595,2525,693): IMUL_Q RBX,16_4
00003a2a(#20244,2526,693): LEA RCX,FEET_OR_METERS_SCALE:d[RBP+RBX]
00003a32(#20244,2527,693): MOV_Q RDX,16_4
00003a39(#20244,2528,693): CALL LENG8$
00003a3e(#20226,2529,693): MOV J,RAX
...
LENG8$:
  00000001800CF090: 48 83 FA 01        cmp         rdx,1
  00000001800CF094: 7D 03              jge         00000001800CF099
  00000001800CF096: 33 C0              xor         eax,eax
  00000001800CF098: C3                 ret
  00000001800CF099: 48 8D 4C 11 FF     lea         rcx,[rcx+rdx-1]
  00000001800CF09E: 48 85 D2           test        rdx,rdx
  00000001800CF0A1: 7E 12              jle         00000001800CF0B5
  00000001800CF0A3: 0F B6 01           movzx       eax,byte ptr [rcx]  # CRASH LOCATION
  00000001800CF0A6: 48 FF C9           dec         rcx
  00000001800CF0A9: 3C 20              cmp         al,20h
  00000001800CF0AB: 75 08              jne         00000001800CF0B5
  00000001800CF0AD: 48 FF CA           dec         rdx
  00000001800CF0B0: 48 85 D2           test        rdx,rdx
  00000001800CF0B3: 7F EE              jg          00000001800CF0A3
  00000001800CF0B5: 48 8B C2           mov         rax,rdx
  00000001800CF0B8: C3                 ret


The value of register RBX (0000000202020200) strikes me as suspicious.

At the crash location, LENG8$+0x13, we can deduce from the preceding instructions that register RBX has the same contents as had been set in the "IMUL_Q RBX,16_4" instruction in Bill's caller. We can thus infer that the I_LOGMTR was equal to 0000000020202020, which is not at all reasonable, as Paul has already pointed out -- it should be in the range 0 to 2.

I suspect that something such as an array overrun caused a character string containing blanks (byte 0x20 repeated) to be written into the integer variable I_LOGMTR.

---
A separate point, for Paul's consideration: the machine instructions given by DUMPBIN and FTN95's traceback for the crash location disagree:

Dumpbin gives: 00000001800CF0A3: 0F B6 01 movzx eax,byte ptr [rcx]

whereas FTN95 gave: 7ff9bba20ae3) movzx_b_q RAX,[RCX]

The distinction could be of no significance unless a subsequent instruction used the upper (nameless) half of RAX, but I should prefer to see the correct instruction disassembled.


Last edited by mecej4 on Sat Jan 23, 2021 10:38 am; edited 2 times in total
Back to top
View user's profile Send private message
wahorger



Joined: 13 Oct 2014
Posts: 1214
Location: Morrison, CO, USA

PostPosted: Fri Jan 22, 2021 7:28 pm    Post subject: Reply with quote

Paul, yes, there is. An "IF" earlier sets it to 1 or 2. This is the only usage for i_logmtr, locally defined. I should probably turn on /BOUNDS_CHECK just in case, but this is in previously working, untouched code. In 32-bit mode, I do have /BOUNDS_CHECK and it does not error out.
Back to top
View user's profile Send private message Visit poster's website
wahorger



Joined: 13 Oct 2014
Posts: 1214
Location: Morrison, CO, USA

PostPosted: Fri Jan 22, 2021 7:34 pm    Post subject: Reply with quote

Paul, yes, there is. An "IF" earlier sets it to 1 or 2. This is the only usage for i_logmtr, locally defined. I should probably turn on /BOUNDS_CHECK just in case, but this is in previously working, untouched code. In 32-bit mode, I do have /BOUNDS_CHECK and it does not cause an error to be detected.

mejec4, the address is within a DLL, loaded at some weird address. Unlike the MAP for 32-bit, the eventual loading address is not given in SLINK64. Your example may be perfectly correct, just not the same as mine because our code and DLL references are different.

The code I show is my code that leads to the LENG8$ call which has the error. I have no idea what is before/after the "offending" instruction.
Back to top
View user's profile Send private message Visit poster's website
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 7912
Location: Salford, UK

PostPosted: Fri Jan 22, 2021 7:47 pm    Post subject: Reply with quote

Another possibility is that something is over-writing the character array.

Try printing out the array just before the failure occurs.
Back to top
View user's profile Send private message AIM Address
wahorger



Joined: 13 Oct 2014
Posts: 1214
Location: Morrison, CO, USA

PostPosted: Fri Jan 22, 2021 8:20 pm    Post subject: Reply with quote

I had thought that /zeroise would zero memory, but no. There was an escapement in a secondary variable was not zero on entry, causing the i_logmtr to NOT be initialized appropriately. On to the next error.

Learned something new, and will apply caution in the future when similar problems arise!

Thanks, mecej4 and Paul.
Back to top
View user's profile Send private message Visit poster's website
wahorger



Joined: 13 Oct 2014
Posts: 1214
Location: Morrison, CO, USA

PostPosted: Fri Jan 22, 2021 11:54 pm    Post subject: Reply with quote

So, cautiously, I approached the debugging of the module and found an interesting (unexpected) result: That is - every local variable I looked at contained the same pattern of (hex) 20202020. Specifically, I had initialized each of these variables with a DATA statement, or a specific assignment when the variable was declared (i.e. INTEGER:: ABCD=0) I printed these locals as the module was entered before any processing took place.

In addition, a ganged pair of radio buttons is (in 64-bit) no longer paired. They operate as separate buttons. Also, they initialized to (hex) 20202020 before they are used.

A local variable that is set via assignment statement later on in the module to 0.0 on entry is displayed as 6.0135E-154 (also= (hex) 2020202020202020).

Compiled in 32-bit mode as /CHECKMATE and as /RELEASE with /BOUNDS_CHECK turned on, none of these problems occur.

I am not able to duplicate none of these (the filling of memory with 20202020, nor the ganged radio button, nor the floating point number) in a small sample program.

Some guidance to find what is causing this would be helpful. DBG64?
Back to top
View user's profile Send private message Visit poster's website
mecej4



Joined: 31 Oct 2006
Posts: 1884

PostPosted: Sat Jan 23, 2021 12:41 am    Post subject: Reply with quote

Do you have one or more character variables declared with len = a large number, inadvertently set equal to blank?

Does your code pass such a variable to a third party library routine which sets the variable to blank?

When you first see a variable with its bytes set to 0x20, is that the first time that execution of the current subprogram occurs, or otherwise?
Back to top
View user's profile Send private message
wahorger



Joined: 13 Oct 2014
Posts: 1214
Location: Morrison, CO, USA

PostPosted: Sat Jan 23, 2021 2:18 am    Post subject: Reply with quote

Within this routine nor in its caller, no big character variables being initialized.

The real*8 is getting changed back on the second time through, but I think the constant 0.0 is getting hammered, so cannot affect that.

The radio button values are not getting hammered again. But the ganging is still not working.
Back to top
View user's profile Send private message Visit poster's website
mecej4



Joined: 31 Oct 2006
Posts: 1884

PostPosted: Sat Jan 23, 2021 10:35 am    Post subject: Reply with quote

Bill, if you just tell us the version number of your CLEARWIN64.DLL, and that version was not a private version, we could easily settle any doubts regarding the library function LENG8$, which is called to find len_trim(char var).

I expect the machine code of that function to be identical in your DLL version to the one that I showed. The function is only 16 instructions long, takes two arguments in RCX and RDX, and returns the result in RAX. It does not change any of the other registers. Nothing can go wrong there.

Therefore, the errors must occur earlier in the call chain, in portions of your code that we know nothing about. With such tough errors, I would not completely trust a symbolic debugger.

You mentioned "radio buttons not getting hammered". I hope that you appreciate that the phrase has no meaning (to me, anyway) within the context of the information that you have disclosed in this thread.
Back to top
View user's profile Send private message
wahorger



Joined: 13 Oct 2014
Posts: 1214
Location: Morrison, CO, USA

PostPosted: Sat Jan 23, 2021 3:05 pm    Post subject: Reply with quote

mejec4, thanks for the response.

Yesterday, my post spoke of memory being filled with spaces, filled over data for which I had data statements. I uncovered also that included the floating point constant of 0.0. So the code in leng8$ is no longer in question. Since it loaded a non-valid number for the index to the array, having it crash was not unexpected. The variable was not set properly because the floating point constant of 0.0 was no longer exactly zero, this prevented a portion of my initialization code from running, leaving a key variable (i_logmtr) set to all spaces (or 538976288). Even though it had been also been initialized to zero in a data statement. The question remains of "Why? Why is memory that holds my variables AND constants getting filled with spaces?". The suggestions to look at initialization statements possibly "gone awry" was good. Big (wide) character arrays getting initialized (there are none). I don't use the char_fill@() function in this particular program at all, relying on program assignment statements to initialize bulk arrays. So where to look for the program going "rouge".

I first took out of the build any /ZEROISE, just in case that was the culprit. This had no effect.

And, yesterday, I put in a little PRINT statement on a couple of variables to show the state of the local variables on initial entry and immediately before first use. The print showed that these locals had been set to spaces. Integers that should be zero or one (from the DATA statements) are displaying as 538976288. REAL*8's are displayed as 6.0135E-154. These real*8's had been initialized by a program statement, setting the value explicitly to 0.0d0.

This morning (23 Jan), I placed in the module a small bit of code. The code would allow me to call this routine BEFORE any of my other code ran, just in case I have something (programmatically) that went bonkers. It can only be run once, it prints, and it immediately returns. Stated another way, before any of my "standard" code that initializes any other variables, opens files, clears character arrays; before any of that executes, I call the routine and print the values of a few variables that should have pre-defined values based on their data statements.

In a perfect world, the numbers that are displayed should match their initialization (data) statement values. That was not what is observed: the values are not what was expected; their values are set to all spaces and this is reflected in the numeric values displayed by the print.

I hope that clears up any confusion that I may have left yesterday.

And, the confusion about why this is happening is still there.
Back to top
View user's profile Send private message Visit poster's website
mecej4



Joined: 31 Oct 2006
Posts: 1884

PostPosted: Sat Jan 23, 2021 3:13 pm    Post subject: Reply with quote

Thanks for the details. The bugs are still present, then?
Back to top
View user's profile Send private message
wahorger



Joined: 13 Oct 2014
Posts: 1214
Location: Morrison, CO, USA

PostPosted: Sat Jan 23, 2021 4:10 pm    Post subject: Reply with quote

Oh, yeah. It is still there. I thought it might be a BLOCK DATA issue, so I went through all my INS files looking for problems. Can't find any, and not getting any compiler errors about too few or too many initializers. So, flummoxed.

Too bad I cannot come up with a smaller example for Paul, et. al. Since 64-bit was a goal for me, not a requirement, this and a few other stumbling blocks have now slowed my adoption.

As an aside, I have been able to find and fix inconsistencies in my code for 64-bit, namely winio@() arguments that were declared as INTEGER(7) incorrectly, or as INTEGER(3) and should have been INTEGER(7).

Even with this particular problem, I was able to make some progress to that ultimate end. I also learned a great deal about DLL's and third party compilers in the process. So, all in all, the time I spent was productive, albeit not to the exact end I had originally envisioned.
Back to top
View user's profile Send private message Visit poster's website
Robert



Joined: 29 Nov 2006
Posts: 444
Location: Manchester

PostPosted: Sat Jan 23, 2021 4:57 pm    Post subject: Re: Reply with quote

mecej4 wrote:


The value of register RBX (0000000202020200) strikes me as suspicious.

At the crash location, LENG8$+0x13, we can deduce from the preceding instructions that register RBX has the same contents as had been set in the "IMUL_Q RBX,16_4" instruction in Bill's caller. We can thus infer that the I_LOGMTR was equal to 0000000020202020, which is not at all reasonable, as Paul has already pointed out -- it should be in the range 0 to 2.

I suspect that something such as an array overrun caused a character string containing blanks (byte 0x20 repeated) to be written into the integer variable I_LOGMTR.


It isn't byte 0x20 repeated, it is 0x02 repeated.

mecej4 wrote:

A separate point, for Paul's consideration: the machine instructions given by DUMPBIN and FTN95's traceback for the crash location disagree:

Dumpbin gives: 00000001800CF0A3: 0F B6 01 movzx eax,byte ptr [rcx]

whereas FTN95 gave: 7ff9bba20ae3) movzx_b_q RAX,[RCX]

The distinction could be of no significance unless a subsequent instruction used the upper (nameless) half of RAX, but I should prefer to see the correct instruction disassembled.


I suspect they decode to the same thing, moving a byte from [RCX] to al, with the remaining length of string in RDX.
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> 64-bit All times are GMT + 1 Hour
Goto page 1, 2, 3  Next
Page 1 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group