Topic: Stack Size in 64-bit

JohnCampbell

Posts: 2526 Sydney

Back to Top

25 Apr 2018 7:54 #21964

I was posting an additional comment about stack_size, but got a bit carried away. So, I thought I should start a new thread.

Two main points follow:

stack_size in slink64 has reverted to hex numbers
stack overflow errors need a rethink, but I don't know who should fix them. In 2018, they should not occur.

stack_size <hex_number>

I note that stack_size requires a hex number. Could we get back to decimal ? (SLINK appears to use decimal)

Also, could we use something like 'stack_size 100_MB'.

Others may be more fluent in hex, but I struggle with a 1 followed by a string of zeros. We need something a bit clearer.

Is the default 'stack_size 0x1000000' or 'stack_size 1000000' ? I would find 'stack_size 16_mb' or 'stack_size 16mb' much easier to read.

We also need the selected stack size to be reported. Does 'map <file>' report the stack size selected ? Is it possible to get this reported in /32 or /64 ? Getting a stack overflow can be annoying, but then struggling with how to specify a larger stack can tip me over the edge.

Stack Overflow Errors

These belong in the last century / last millennium !! Stack overflow is an annoying error report. It is not my (the programmer's) fault, but the fault of the 'stack manager' who is too lazy to fix this problem it has caused by failure to find another stack location. There are gigabytes of unused memory available to be used. Even /32 typically has addresses above 2gb as unused. I am serious when I say it is the fault of the lazy 'stack manager'. If you think about what is happening, this is where the problem should be solved. Who controls the 'stack manager' Is it Microsoft O/S or SLINK ? We need a Stack overflow address to extend the stack and in 64-bit that should not be a difficult problem.

Who agrees ?

John

PaulLaidler

Posts: 7975 Salford, UK

Back to Top

25 Apr 2018 8:22 #21966

John

Thank you for your feedback. I will make a note of your request as far as SLINK64 is concerned.

JohnCampbell

Posts: 2526 Sydney

Back to Top

25 Apr 2018 12:06 #21974

Paul,

My apologies, I read the SLINK help in more detail and it appears that it also uses hex values. I first read 'size in bytes', although the examples are hex. an alternative decimal or kb, mb or gb would be easier.

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

25 Apr 2018 2:32 #21978

JohnC post should be an example for everyone to not afraid to discuss and suggest to fix any issues. Remember that even with Windows' billion users it is rarely who report the problems and send the suggestions and if report it takes a lot of similar requests to move things in right direction. With smaller users base of Fortran compilers we will lose weeks on hidden bugs and inconveniences still present in any software. Please post any suggestion you have.

We discussed stack many times, some was fixed, some not. As to showing hex of anything in the compiler we also discussed, thought it was fixed, but somehow it surfaced back again. Of course I support John's suggestion with both hands, even more, I will stress and repeat that every time developers use hex for regular users they lose potential purchasers. Smilarly the debugger - look at it's menu and GUI - has never mention hex windows and never switch to binary window unless the experienced user explicitly asks for that in settings.

LitusSaxonicum

Posts: 2284 Yateley, Hants, UK

Back to Top

25 Apr 2018 2:38 #21979

Is there any way of determining, in advance, the likely stack requirements? I suspect not, because if it were easy then it would be done.

There are, however, things one does as a programmer that make various size demands on the stack. It would be helpful to have a brief list of them in a post that we could return to from time to time.

My feeling is that passing lots of big arrays as arguments to subroutine and function calls imposes demands on the stack, which means that a traditional programmer like myself who prefers COMMON will encounter a stack overflow less often than someone with a different style.

Eddie

mecej4

Posts: 1911

Back to Top

25 Apr 2018 6:34 #21985

Each time that a subroutine is called or a function is invoked, some stack is used up: (i) for the arguments, saved registers and return address (ii) local variables, especially local arrays. Only when the present subroutine returns will that stack space be freed up. Arguments are usually passed by address (or, sometimes, descriptor), so the size of the argument has no effect. Passing a big array takes up 4 or 8 bytes, as does passing a scalar.

Here is why calculating the stack requirement of a program can rarely provide more than an estimate: the call depth depends on the program logic and input data. The size of local arrays may depend on dummy arguments and this size may be passed through COMMON. The deeper the nesting of calls and the larger the local arrays in the chain of subprograms, the more is the stack consumption. The stack requirement is dynamic, data dependent, and unpredictable.

Allocating variables in COMMON instead of on the stack takes care of the stack overflow problem, but brings other problems that can be much worse. When you find that a variable in COMMON changes when you did not expect it to, how do go about finding where it got changed? Which subprogram is supposed to initialise variables in COMMON? Is it possible to check for uninitialised variables that are in COMMON?

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

25 Apr 2018 7:50 #21987

Quoted from mecej4

Allocating variables in COMMON instead of on the stack takes care of the stack overflow problem, but brings other problems that can be much worse. When you find that a variable in COMMON changes when you did not expect it to, how do go about finding where it got changed? Which subprogram is supposed to initialise variables in COMMON? Is it possible to check for uninitialised variables that are in COMMON?

Never had any problems with the Common with FTN77/95. All that is exactly what was perfectly solved in 80th, the times of first FTN77 while the whole Fortran community still discussing this till today. Put Use or Write interruption on variable in SDBG and go drink champagne, compiler will find where your variable was changed for you.

mecej4

Posts: 1911

Back to Top

26 Apr 2018 3:14 #21988

What if that breakpoint is hit 1000 times over the run, the first 999 hits show the value changed, but nothing was wrong. At this point, having taken 999 sips of bubbly, you would not notice the bug that surfaces on the 1000th hit.

I am currently fighting with an old code in which a couple of DO loop indices are in COMMON, and are also visible in subroutines that are called in the loop and also have access to the same blocks. The blocks have dozens of variables, and I would not have spotted the risky nature of the code using a debugger. Rather, it was static analysis that showed that up.

JohnCampbell

Posts: 2526 Sydney

Back to Top

26 Apr 2018 8:04 #21990

mecej4,

There are two possible solutions:

create a second (another) stack when the first runs out
place 'large' arrays somewhere else. ie by default place large arrays using the allocate mechanism.

I don't understand why the 1) mechanism is not available. It should be. There have been suggestions that approach 2) affects performance, although I wonder about these claims and what test was used.

Also, multi-thread stack is a problem. Each thread should have it's own stack. I could suggest this would be more efficient, certainly if each thread stack was on a different memory 'page'

The whole idea that the stack can not have a 'reserve' and be extended is just lazy.

mecej4

Posts: 1911

Back to Top

26 Apr 2018 10:30 #21992

John, to obtain an understanding of why we have limits on stacks, heaps and so forth, we have to look at how addressing is done in the microprocessor. It may be convenient to think of all addresses as 64-bits, since we receive a 64-bit result from LOC(<variable>), but that is an oversimplification at both the logical and the physical levels.

Most X64 instructions use IP-relative addressing, which is only a 32 bit offset from the next instruction. The same holds for most jumps and calls. Then there are short jumps, in which the relative offset is a single byte. In general, these short offsets of code and data (w.r.t. the instruction pointer, stack pointer, frame pointer and, in the 16-bit era, segment registers) keep the code compact and easy to read (for those who read assembler listings), and reduce the latency associated with bringing in instructions and data from RAM to the processor.

Thus, if you have a local variable, say, an integer array IV(nn), the address of IV may be just an 8-bit offset from RBP or RSP, and the index into IV can be in a 32-bit register such as EBX. A compiler will determine that an expression such as [RSP+1CH+4*ESI] will match IV(j), and put out an instruction that may be 3, 4, 5 or 6 bytes; if absolute addresses had been used, the corresponding instruction would be 10 bytes long.

Similar shortening is possible for arithmetic instructions. A register can be zeroed with a two-byte instruction, whereas loading an immediate zero may take ten bytes.

Now we can see why stack overflow is a fact of life. The compiler chose a three byte instruction, but our program may reach a point where we need a different way of addressing or we have to redo the stack layout. This happens while my EXE is running, and I cannot complain to the compiler. In fact, there may be no compiler at all on the machine on which the EXE is running. Therefore, the RTL issues a stack fault notice, and sometimes a traceback.

In effect, all compiled code represents a compromise between efficiency and safety, with a bias towards efficiency because that is our preference.

Permit me an analogy. When I write 'John', it is clear who I am writing to, from the context. A 4-character address suffices. If John Silver threw in his hat into this ring, that would no longer work. I would need to edit the post, use a 13-character address and resolve ambiguities in the text of my post. Stack overflow is good, if you consider the alternatives!

Ref.: https://eli.thegreenplace.net/2012/01/03/understanding-the-x64-code-models .

davidb

Posts: 555 UK

Back to Top

26 Apr 2018 10:52 #21993

When a local variable is declared it is usually put on the current stack frame.

You can use SAVE to over-ride this, or the use SAVE attribute for specific variables. If you don't want to change the source code you can use the option /SAVE when you compile. I am assuming here that this still works with /64.

In a program that uses multiple threads there should be a separate stack for each thread. This is my experience with OpenMP in any case.

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

26 Apr 2018 3:14 #21996

Quoted from mecej4 What if that breakpoint is hit 1000 times over the run, the first 999 hits show the value changed, but nothing was wrong. At this point, having taken 999 sips of bubbly, you would not notice the bug that surfaces on the 1000th hit.

I am currently fighting with an old code in which a couple of DO loop indices are in COMMON, and are also visible in subroutines that are called in the loop and also have access to the same blocks. The blocks have dozens of variables, and I would not have spotted the risky nature of the code using a debugger. Rather, it was static analysis that showed that up.

You probably can imaging other even harder situations too but as to me do anyone have any doubts that if I needed I'd not requested this to implement during 30 years of using SDBG? 😃 Specifically after 999 sips...

I encourage you to do the same if you think this will improve debugging. I immediately see several easy to implement improvements of this powerful facility. For example instead of stopping it might to write about offending lines of the code into separate window and continue. Or debugger can be set to ignore next time the interruptions which you found not relevant etc.

As to stack problems, I'm curious how this topic appeared in 64bit area. How about our hopes that the stack limitations will be essentially dead with 64bits? Did anyone hit the 64bit stack size limit with the warning about stack? I get sometimes problems of allocating arrays in 64bits I'm surprised a bit as there exist swap file almost infinite size compared to RAM size. I then close some programs to free the RAM and all works fine but still it is interesting to find the reason.

JohnCampbell

Posts: 2526 Sydney

Back to Top

27 Apr 2018 1:43 #22001

Davidb, /SAVE is not an option that should be encouraged. Also could you provide a reference that indicates separate threads have a separate stack allocated. Do these individual stacks start on separate memory pages to reduce memory coherence problems ? I am very interested in this issue. ( At present I use ALLOCATE for large arrays, but if I could guarantee the thread stack was big enough and these private arrays were on separate memory pages, this should improve performance. My ALLOCATE approach probably guarantees that threads will share the same memory page )

Mecej4, While I understand there can be local addressing, I am not sure how general that can be. I would expect that any array that is an argument to a routine can not be locally addressed, as there is the potential that it can span a memory page boundary ( is a page 64kb ?) or be larger than a single page. I wonder how many array addresses can be 16-bit or 32-bit in /64 code. It must be a small set where the array memory location and size allow the compiler to definitely know this. Most of my calculate intense code involves large arrays (multi-page arrays). Using local array copies of big array sections could be a way to allow the compiler to speed up addressing. I am looking at a cache smart variation of MATMUL where this may explain the improved performance.

I do think stack management needs a re-work. Where is this managed ?

mecej4

Posts: 1911

Back to Top

27 Apr 2018 2:15 #22002

See https://msdn.microsoft.com/en-us/library/windows/desktop/ms686774(v=vs.85).aspx regarding threads and stacks. Note the comments in the second paragraph regarding keeping stacks modest in size.

If you look at the assembler code listings for some of the subroutines that you compile, you will find that most of the variable references are through a R/M expression, with RBP and RSP being used more than any others. It is only for global variables that 64-bit addresses would be needed. Even for those, there are other strategies.

For dummy arguments, their addresses are in registers or on the stack when the call is made. If on stack, the 8-byte address is usually loaded into a register before anything is done with it. Once the address is in a register, that register can be represented in 3 or 4 bits in instructions, so the naked 8-byte address is hardly to be seen in any of the instructions that use the argument.

See http://bottomupcs.sourceforge.net/csbu/x3824.htm about the Global Offset Table.

davidb

Posts: 555 UK

Back to Top

27 Apr 2018 9:39 #22007

A well written program that confirms fully with the standard should be correct whether local variables are placed on the stack or in code space. In fact the Fortran standard does say anything about where variables are placed. For big arrays I occasionally use the SAVE attribute. For very big arrays it is better to use a different algorithm and/or data structure.

davidb

Posts: 555 UK

Back to Top

27 Apr 2018 9:48 #22008

Quoted from mecej4 See https://msdn.microsoft.com/en-us/library/windows/desktop/ms686774(v=vs.85).aspx regarding threads and stacks. Note the comments in the second paragraph regarding keeping stacks modest in size.

Yes, thread-specific stacks are used to hold thread-private variables. Management of the stack size comes down to careful selection of algorithm and data structures. As a simple example, in quicksort you don't need to use recursion to sort both sub-lists. Recursion should only be used on the shortest sub-list; iteration should be used on the longest sub-list. This strategy keeps the stack size down.

JohnCampbell

Posts: 2526 Sydney

Back to Top

25 May 2018 10:38 #22167

davidb & mecej4,

I have now reread your posts on this thread. To me, what you have written is just excuses for what is a very poor/lazy system design. If the stack is to overflow, there should be the ability to provide a secondary stack. In 64-bit, this should be easy, as I think all (virtual) memory above 4gb is dynamically allocated at run time, so stack extensions should not be a problem. The only problem with this approach is the claimed relative efficiency of using the local stack vs secondary stack above 4gb, although I expect this can be overcome with the mapping of virtual memory to physical memory pages.

Unfortunately, any change in this area is probably unlikely to happen as the stack management is an O/S and not a FTN95 domain ?

John

PaulLaidler

Posts: 7975 Salford, UK

Back to Top

14 Feb 2019 10:40 #23244

Some development has been carried out with respect to the permitted size of the FTN95 stack.

SLINK64 allows you set the maximum stack size which has a certain default value. It has been suggested above that FTN95/SLINK64 should know how much stack is required but this is not the case. On the other hand, SLINK64 could use a very large default size but that would not be sensible. So we are left with the status quo except...

FTN95 and SLINK64 have been developed to allow the user to extend the stack size above the 4GB level. So if a user wants to have enormous local arrays then it will now be possible provided SLINK64 is instructed accordingly.

Regarding the irritation of having to provide a hex value for the stack size, the only way to fix this is for us to provide an alternative command (a hex value might not contain x or A to F characters in which case it could be hex or decimal). An alternative command could be added if there is a demand.

LitusSaxonicum

Posts: 2284 Yateley, Hants, UK

Back to Top

14 Feb 2019 12:54 #23245

Paul,

Isn't it sensible to make numbers that humans have to communicate to a computer always decimal or at least base 10 (like thousands or millions)? Frankly, I'd suspect that some multiple of megabytes is best, as I defy anyone to accurately predict the needed stack extent down to individual bytes, or some small multiple thereof.

I can't quite see why FTN95 / SLINK can't work out the required stack size, but I will take it on trust. There is one simple rule. If you have numerous very large local arrays passed between subprograms you will need a big stack, and if you don't, you won't. Us COMMONers simply don't need the huge stack, which is probably why COMMON was invented in the first place.

Eddie

PaulLaidler

Posts: 7975 Salford, UK

Back to Top

14 Feb 2019 3:04 #23246

I have added a new SLINK64 command 'stack' that takes a value which is the number of (decimal) megabytes.