Silverfrost Forums

Welcome to our forums

When a crash is not a crash

12 Sep 2023 1:33 #30554

I have had this happen a few times to me, but I cannot duplicate it at will. I'm telling you my experience in hope that if others have this happen, or it is happening now, a solution can be found.

My code is stable, meaning that 99+% of the code has had no changes in more than a year. When I do change the code, I'll run it, then the other closely related functions until I am satisfied the answers are correct and there are not faults in the code (as much as possible). Then I'll go through the remainder of the code looking for secondary effects of code changes, especially in support routines used throughout my product. So what happens is a mystery.

I was running through a change using the 'Common Dialog' that Paul has provided. In order to test the implementation, I would have to run through all the functions that use this file dialog.

From my main window after several minutes of running through other functions, I started the next function, and the program closed. It did not throw an error code. It just stopped. But it apparently left a bit of code running. I had to use the task manager to stop this. If I do not, I've found, I cannot update the EXE file (makes sense).

I then restarted (without recompiling) the code and ran through a similar sequence (I can't say the same since I was at least10 minutes into testing it by the time it stopped). I could not get the code to stop. I tried this from a 'cold' start 3 more times without an issue.

I have seen this in the past. It is not consistent with any particular function being started, nor any particular sequence of file open/close, nor anything else I can deduce.

And it is not common! That is a blessing, actually. And a curse, because if I can get a crash or stoppage consistently, I've been able to track down a root cause. This kind of problem (could be my code, I'll admit) is insidious and frustrating.

So, if you've had this kind of thing happen, and have been able to identify or isolate an issue, I'd love to hear of it.

13 Sep 2023 5:42 #30555

This sounds like a stack corruption, but it is very difficult to locate. We did have something similar a 'few' years ago with a file open dialogue, which was overcome when a newer windows interface routine was adopted by FTN95. From memory the stack was corrupted by the older Winapp routine, so very difficult to identify.

At the moment I am getting intermittent problems with /64 /checkmate, where my FTN95 .exe is being flaged as malicious software. (Ver 8.97.2) There are two sorts, one where the .exe is flagged and delayed, but a worse where the .exe is deleted. If I change the array sizes in a module, it can sometimes go away. It is a code I am developing. The last few weeks the problem has disappeared. I am actually developing and running the test on a 128GB USB stick

13 Sep 2023 12:01 #30558

John, thanks for the information. It has been a long time since I had a stack related issue. Nevertheless, it could be something like this.

I'll have to do a bit of digging and debugging to see what I can find.

I appreciate the clue!

Good luck with your 64-bit project! Bill

13 Sep 2023 5:31 #30559

John, been thinking about stack issues.

I've (usually) had stack issues when something was supposed to be (or was inadvertently) on the stack, and the called/caller would get messed up. With function calls, it would usually be trapped as an overflow or something like that.

I wonder if it would be worthwhile to have, for debug purposes, a compiler-generated 'stack check' so that upon return from any function/subroutine call (be it FTN95, or C or a DLL), that the stack pointer should 'line up' from before the call.

I know that right now, I have bounds checking turned on, and it has helped me find problems buried deep with a set of calls, and/or identified problems with my code where I didn't handle the edge conditions.

Maybe this would help.

Just a thought. Bill

13 Sep 2023 6:29 #30560

You can use /EXPLIST with /DEBUG on the command line.

The list can be viewed when stepping through the code using either the debugger SDBG/SDBG64 or Plato.

You can even single step whilst viewing the assembly list.

I am not sure you need /EXLIST. You will probably get it anyway.

13 Sep 2023 8:49 #30561

Paul, thanks for weighing in on this.

I'm not sure using the debugger is as efficient as having an embedded software check. Since the problem arises only sporadically, it is quite possible that if the stack is corrupted consistently, sometimes it contains something that is not damaging to operations, yet at some point may cause the exit. Having a tool that one could use to check stack issues automatically, then turn it off for production, would be what I would look for, if at all possible.

Having (literally) hundreds of functions/subroutines that can be called in a myriad of ways, an automated checking process makes sense.

Bill

14 Sep 2023 6:44 #30562

Bill

Is it Win32 or x64 and are you using /CHECKMATE for when testing?

14 Sep 2023 10:52 #30565

Paul,

I'm strictly using WIN32.

I perform a 'dual' build, building a /RELEASE and /DEBUG set of code. For the /RELEASE, I also include a /BOUNDS_CHECK (sometimes I get sloppy on boundary conditions).

Typically, my testing involves only /RELEASE, and this is the build that is release to my customers. If I have a particular failure that I can repeat and/or the traceback is confusing, I'll use the /CHECKMATE to assist in isolation.

This issue has the unique signature that it very seldom occurs. I've never had one of my customers complain that it has occurred to them, and I have specifically asked this question. To be fair, with the frequency that it does occur, I might forget if it did in the past. It might also be that it only occurs in my environment. If that is the case, great for my customers! Still, it is troubling.

I don't know if that adds any clarity to the situation or not. That's where I am with it; occurs infrequently, difficult to impossible to replicate.

Bill

14 Sep 2023 10:59 #30566

Bill

It might be worth considering a change to x64. In some cases the change is quite easy.

14 Sep 2023 7:45 #30567

Paul, that would be nice, but I have a large amount of code ('C' and 'C++') that supports the Fortran main body that isn't 'ready to play'.

That said, I have modified my FTN95 code so that address pointers and window handles are declared appropriately as (7).

As a follow-on question: Does SCC support 64-bit?

I use SCC for some of the 'C' code that generates PDF's, and for the coordinate transformations. I found I had to use MINGW for the 'C++' code (font conversions and DXF file generation).

Bill

15 Sep 2023 5:57 #30568

Bill

Most Win32 C++ code will compile with SCC /64 but it is not 'supported' in the sense that we will promise to fix any failures. Having said that we are willing to accept proposed bug fixes for 64 bit SCC.

15 Sep 2023 1:32 #30569

Paul,

Thanks for the explanations. I look forward in the coming months (prognosticators say it will likely be a hard, snowy winter here in Colorado) to experimenting with SCC /64 and integrating it. It will make a lot of things much easier and robust to gain the address/data space!

Bill

15 Sep 2023 1:46 #30570

John,

One thing I had found (kind of by accident): If you send messages (i.e. window_update@()) to an open window too quickly, you can get the window to close prematurely. I had this happen with a 'status' window that showed the progression of a process (used %lw). I was processing over a thousand items per second, and updating the counter at each one. This was too much! The window would lock up, close prematurely, not close when commanded, etc. I reworked the status to update the count about once per second, and that 'solved' the problem. It is possible that my standard routine for opening a new window (positioning, options, etc.) is/was sending messages to the window too quickly, and on a rare occasion, causing a fault. I have attempted to minimize all the API calls during the window creation/positioning so this is less likely to happen. we shall see.

And, as I have reported here in the past, using browse_for_folder1@() is problematic (although I still need it). If the user elects to add a new folder (as it is designed to do) and (possibly) renames it (as it should), the program will prematurely end on my development machine. Not crash, just end. I've never been able to track down what is actually happening.

Other FTN95 users don't seem to have this problem. Indeed, on some other computers I use, I cannot get this to occur. It is a mystery, as other software products appear to use this as well (visually they look the same) and they do not abruptly end.

So, I'll continue to look for ways to track faults, trapped or not, and make headway!

Bill

16 Sep 2023 2:45 #30575

Just an FYI. I had this happen just this morning. I'd been working on a support function, started the program to check it out, did some data selections, exited to the main window, came back in, and before the function could be invoked, or even get the selection window displayed, the program exited without a trappable error.

It was fortunate that I was not deep into the program, and had the logging file there to make sure I did the same thing (and my memory to help with that). I tried the same sequence again, and no issues. I tried another half dozen times, again with no premature exiting.

As I said, it is not frequent that this occurs, and is essentially non-repeatable.

20 Sep 2023 4:54 #30585

I was doing some more testing after embedding some trace statements and the stack checker. And I got the signal that a stack discrepancy was discovered.

I have a couple of 'device drivers'. The one signaling the fault is called 'SEND_FILE_TO_LOG'. The purpose is to send data to one (or two) destinations depending upon a logging threshold. Above the threshold, the data goes to the standard output device (=0) and to the logging file (opened in 'C'). Below the threshold, the data only goes to the logging file. N.B.: Since you cannot perform any Fortran I/O operations inside of a driver, this presents some challenges! Using 'C' to open a file and perform output gets around this issue.

It would initially appear that when a callback was attempting to use the device driver, another call to the driver was initiated. This 'messes up' the stack check. I turned off the specific logging that was occurring, reran the sequence, and no errors occurred.

I turned the offending output back on and did some deeper debugging. Turns out, this routine was started at least two times while the first operation was pending. This all occurred during a %sc callback from a new window creation process. The %sc routine called is defined as a recursive function because I found this is the only way to make it work properly (%sc callback is/can-be called recursively).

I'm not sure that defining the driver as recursive subroutine will help, but I'm going to take a look at it. I must be missing something; I'm not seeing any description of what the use of 'recursive' does for a function or subroutine (internally). Does it negate a global /SAVE and mean that any local variables are assumed to NOT be permanent (i.e. on the stack) unl;ess declared with the SAVE attribute?

Since I like learning things, here's what I've learned:

  1. Putting debugging outputs into windows callbacks might not work like you think.
  2. Your windows callback functions might need protection-against/handling-with recursion.

Bill

Please login to reply.