forums.silverfrost.com

JohnCampbell · Joined: 16 Feb 2006 Posts: 2556 Location: Sydney

I was posting an additional comment about stack_size, but got a bit carried away.
So, I thought I should start a new thread.

Two main points follow:
1) stack_size in slink64 has reverted to hex numbers
2) stack overflow errors need a rethink, but I don't know who should fix them. In 2018, they should not occur.

stack_size <hex_number>
I note that stack_size requires a hex number. Could we get back to decimal ? (SLINK appears to use decimal)

Also, could we use something like "stack_size 100_MB".

Others may be more fluent in hex, but I struggle with a 1 followed by a string of zeros. We need something a bit clearer.

Is the default "stack_size 0x1000000" or "stack_size 1000000" ?
I would find "stack_size 16_mb" or "stack_size 16mb" much easier to read.

We also need the selected stack size to be reported.
Does "map <file>" report the stack size selected ?
Is it possible to get this reported in /32 or /64 ?
Getting a stack overflow can be annoying, but then struggling with how to specify a larger stack can tip me over the edge.

Stack Overflow Errors

These belong in the last century / last millennium !!
Stack overflow is an annoying error report. It is not my (the programmer's) fault, but the fault of the "stack manager" who is too lazy to fix this problem it has caused by failure to find another stack location. There are gigabytes of unused memory available to be used. Even /32 typically has addresses above 2gb as unused.
I am serious when I say it is the fault of the lazy "stack manager". If you think about what is happening, this is where the problem should be solved.
Who controls the "stack manager" Is it Microsoft O/S or SLINK ?
We need a Stack overflow address to extend the stack and in 64-bit that should not be a difficult problem.

Who agrees ?

John

PaulLaidler · Posted: Wed Apr 25, 2018 9:22 am Post subject:

John

Thank you for your feedback. I will make a note of your request as far as SLINK64 is concerned.

JohnCampbell · Joined: 16 Feb 2006 Posts: 2556 Location: Sydney

Paul,

My apologies, I read the SLINK help in more detail and it appears that it also uses hex values. I first read "size in bytes", although the examples are hex. an alternative decimal or kb, mb or gb would be easier.

DanRRight · Posted: Wed Apr 25, 2018 3:32 pm Post subject:

JohnC post should be an example for everyone to not afraid to discuss and suggest to fix any issues. Remember that even with Windows' billion users it is rarely who report the problems and send the suggestions and if report it takes a lot of similar requests to move things in right direction. With smaller users base of Fortran compilers we will lose weeks on hidden bugs and inconveniences still present in any software. Please post any suggestion you have.

We discussed stack many times, some was fixed, some not. As to showing hex of anything in the compiler we also discussed, thought it was fixed, but somehow it surfaced back again. Of course I support John's suggestion with both hands, even more, I will stress and repeat that every time developers use hex for regular users they lose potential purchasers. Smilarly the debugger - look at it's menu and GUI - has never mention hex windows and never switch to binary window unless the experienced user explicitly asks for that in settings.

LitusSaxonicum · Posted: Wed Apr 25, 2018 3:38 pm Post subject:

Is there any way of determining, in advance, the likely stack requirements? I suspect not, because if it were easy then it would be done.

There are, however, things one does as a programmer that make various size demands on the stack. It would be helpful to have a brief list of them in a post that we could return to from time to time.

My feeling is that passing lots of big arrays as arguments to subroutine and function calls imposes demands on the stack, which means that a traditional programmer like myself who prefers COMMON will encounter a stack overflow less often than someone with a different style.

Eddie

mecej4 · Joined: 31 Oct 2006 Posts: 1891

Each time that a subroutine is called or a function is invoked, some stack is used up: (i) for the arguments, saved registers and return address (ii) local variables, especially local arrays. Only when the present subroutine returns will that stack space be freed up. Arguments are usually passed by address (or, sometimes, descriptor), so the size of the argument has no effect. Passing a big array takes up 4 or 8 bytes, as does passing a scalar.

Here is why calculating the stack requirement of a program can rarely provide more than an estimate: the call depth depends on the program logic and input data. The size of local arrays may depend on dummy arguments and this size may be passed through COMMON. The deeper the nesting of calls and the larger the local arrays in the chain of subprograms, the more is the stack consumption. The stack requirement is dynamic, data dependent, and unpredictable.

Allocating variables in COMMON instead of on the stack takes care of the stack overflow problem, but brings other problems that can be much worse. When you find that a variable in COMMON changes when you did not expect it to, how do go about finding where it got changed? Which subprogram is supposed to initialise variables in COMMON? Is it possible to check for uninitialised variables that are in COMMON?

DanRRight · Posted: Wed Apr 25, 2018 8:50 pm Post subject: Re:

mecej4 · Joined: 31 Oct 2006 Posts: 1891

What if that breakpoint is hit 1000 times over the run, the first 999 hits show the value changed, but nothing was wrong. At this point, having taken 999 sips of bubbly, you would not notice the bug that surfaces on the 1000th hit.

I am currently fighting with an old code in which a couple of DO loop indices are in COMMON, and are also visible in subroutines that are called in the loop and also have access to the same blocks. The blocks have dozens of variables, and I would not have spotted the risky nature of the code using a debugger. Rather, it was static analysis that showed that up.

JohnCampbell · Joined: 16 Feb 2006 Posts: 2556 Location: Sydney

mecej4,

There are two possible solutions:
1) create a second (another) stack when the first runs out
2) place "large" arrays somewhere else. ie by default place large arrays using the allocate mechanism.

I don't understand why the 1) mechanism is not available. It should be.
There have been suggestions that approach 2) affects performance, although I wonder about these claims and what test was used.

Also, multi-thread stack is a problem. Each thread should have it's own stack. I could suggest this would be more efficient, certainly if each thread stack was on a different memory "page"

The whole idea that the stack can not have a "reserve" and be extended is just lazy.

mecej4 · Joined: 31 Oct 2006 Posts: 1891

John, to obtain an understanding of why we have limits on stacks, heaps and so forth, we have to look at how addressing is done in the microprocessor. It may be convenient to think of all addresses as 64-bits, since we receive a 64-bit result from LOC(<variable>), but that is an oversimplification at both the logical and the physical levels.

Most X64 instructions use IP-relative addressing, which is only a 32 bit offset from the next instruction. The same holds for most jumps and calls. Then there are short jumps, in which the relative offset is a single byte. In general, these short offsets of code and data (w.r.t. the instruction pointer, stack pointer, frame pointer and, in the 16-bit era, segment registers) keep the code compact and easy to read (for those who read assembler listings), and reduce the latency associated with bringing in instructions and data from RAM to the processor.

Thus, if you have a local variable, say, an integer array IV(nn), the address of IV may be just an 8-bit offset from RBP or RSP, and the index into IV can be in a 32-bit register such as EBX. A compiler will determine that an expression such as [RSP+1CH+4*ESI] will match IV(j), and put out an instruction that may be 3, 4, 5 or 6 bytes; if absolute addresses had been used, the corresponding instruction would be 10 bytes long.

Similar shortening is possible for arithmetic instructions. A register can be zeroed with a two-byte instruction, whereas loading an immediate zero may take ten bytes.

Now we can see why stack overflow is a fact of life. The compiler chose a three byte instruction, but our program may reach a point where we need a different way of addressing or we have to redo the stack layout. This happens while my EXE is running, and I cannot complain to the compiler. In fact, there may be no compiler at all on the machine on which the EXE is running. Therefore, the RTL issues a stack fault notice, and sometimes a traceback.

In effect, all compiled code represents a compromise between efficiency and safety, with a bias towards efficiency because that is our preference.

Permit me an analogy. When I write "John", it is clear who I am writing to, from the context. A 4-character address suffices. If John Silver threw in his hat into this ring, that would no longer work. I would need to edit the post, use a 13-character address and resolve ambiguities in the text of my post. Stack overflow is good, if you consider the alternatives!

Ref.: https://eli.thegreenplace.net/2012/01/03/understanding-the-x64-code-models .

davidb · Joined: 17 Jul 2009 Posts: 560 Location: UK

When a local variable is declared it is usually put on the current stack frame.

You can use SAVE to over-ride this, or the use SAVE attribute for specific variables. If you don't want to change the source code you can use the option /SAVE when you compile. I am assuming here that this still works with /64.

In a program that uses multiple threads there should be a separate stack for each thread. This is my experience with OpenMP in any case.
_________________
Programmer in: Fortran 77/95/2003/2008, C, C++ (& OpenMP), java, Python, Perl

DanRRight · Posted: Thu Apr 26, 2018 4:14 pm Post subject: Re:

JohnCampbell · Joined: 16 Feb 2006 Posts: 2556 Location: Sydney

Davidb,
/SAVE is not an option that should be encouraged. Also could you provide a reference that indicates separate threads have a separate stack allocated. Do these individual stacks start on separate memory pages to reduce memory coherence problems ? I am very interested in this issue.
( At present I use ALLOCATE for large arrays, but if I could guarantee the thread stack was big enough and these private arrays were on separate memory pages, this should improve performance. My ALLOCATE approach probably guarantees that threads will share the same memory page )

Mecej4,
While I understand there can be local addressing, I am not sure how general that can be. I would expect that any array that is an argument to a routine can not be locally addressed, as there is the potential that it can span a memory page boundary ( is a page 64kb ?) or be larger than a single page. I wonder how many array addresses can be 16-bit or 32-bit in /64 code. It must be a small set where the array memory location and size allow the compiler to definitely know this. Most of my calculate intense code involves large arrays (multi-page arrays).
Using local array copies of big array sections could be a way to allow the compiler to speed up addressing. I am looking at a cache smart variation of MATMUL where this may explain the improved performance.

I do think stack management needs a re-work. Where is this managed ?

mecej4 · Joined: 31 Oct 2006 Posts: 1891

See https://msdn.microsoft.com/en-us/library/windows/desktop/ms686774(v=vs.85).aspx regarding threads and stacks. Note the comments in the second paragraph regarding keeping stacks modest in size.

If you look at the assembler code listings for some of the subroutines that you compile, you will find that most of the variable references are through a R/M expression, with RBP and RSP being used more than any others. It is only for global variables that 64-bit addresses would be needed. Even for those, there are other strategies.

For dummy arguments, their addresses are in registers or on the stack when the call is made. If on stack, the 8-byte address is usually loaded into a register before anything is done with it. Once the address is in a register, that register can be represented in 3 or 4 bits in instructions, so the naked 8-byte address is hardly to be seen in any of the instructions that use the argument.

See http://bottomupcs.sourceforge.net/csbu/x3824.htm about the Global Offset Table.

davidb · Joined: 17 Jul 2009 Posts: 560 Location: UK

A well written program that confirms fully with the standard should be correct whether local variables are placed on the stack or in code space. In fact the Fortran standard does say anything about where variables are placed. For big arrays I occasionally use the SAVE attribute. For very big arrays it is better to use a different algorithm and/or data structure.
_________________
Programmer in: Fortran 77/95/2003/2008, C, C++ (& OpenMP), java, Python, Perl