Silverfrost Forums

Welcome to our forums

Stack problem

20 Jul 2018 8:39 #22381

Funny, and unimaginable, but i somehow reached some kind of stack limit on 64bit. Could be the bug too...How come i started getting 'stack overflow' with 64bit and the program even does not start? Wasn't with 64bits stack made virtually infinite (i assume RAM size + paging file size)?

Or with 64bit case the stack was not eliminated (better say erradicated, exterminated, @#$%^^&&, so much all hated it with 32bits) and there exist some (larger then with 32bits) default stack size which we can extend on demand same way like we were doing that with 32bits?

21 Jul 2018 7:01 #22384

Dan

Like you, I can't imagine what is wrong.

23 Jul 2018 9:30 #22389

Come on, Dan, give the man a better clue as to what's happened.

Eddie

23 Jul 2018 12:01 #22390

I can understand Dan's frustrations, but I think that he has allowed himself to be oversold on what '64-bit' signifies. Many (most?) objects produced by a x-64 compiler tend to use mostly register-relative, in particular RIP-relative addressing. See, for example,

http://www.nynaeve.net/?p=192

The offset that is added to the register (RIP or another general purpose register or two) is restricted to 32-bits for most instructions. That is a compromise that AMD selected when designing the X64. As do most compromises, this compromise goes with good and bad consequences. The Intel manual https://software.intel.com/sites/default/files/managed/a4/60/253665-sdm-vol-1.pdf says in its section 3.3.7:

Generally, displacements and immediates in 64-bit mode are not extended to 64 bits. They are still limited to 32 bits and sign-extended during effective-address calculations. In 64-bit mode, however, support is provided for 64-bit displacement and immediate forms of the MOV instruction.

Here is a small test program to demonstrate this limitation.

program big_stack
implicit none
integer, parameter :: i32=selected_int_kind(8), i64=selected_int_kind(16)
!
integer(i32) :: j32
integer(i64) :: j64
!
j64=2*65536
j64=j64*j64 ! 2^34, which is larger than HUGE_INT32 = 2^31-1
print *,huge(j32),huge(j64)
call sub(j64)

contains
subroutine sub(asiz)
implicit none
integer(i64),intent(in) :: asiz
real :: stk(asiz)
integer(i64) :: i
do i=1,asiz
   stk(i) = i*Z'0800'
end do
print *,stk(asiz/2)
return
end subroutine

Dan, please try this test program with FTN95 and with other compilers that you have access to.

25 Jul 2018 11:31 #22398

Dan,

All windows programs that use a stack have a stack size limit, both 32-bit or 64-bit. Other operating systems allow a stack extension, but the windows I know doesn't. That Windows will not provide a stack extension is a significant failing of the O/S; not compatible with a 64-bit O/S required performance.

I have written a program to test the stack size, by iterating and creating a larger automatic array, but to do the test you have to ensure that large local arrays are not placed on the heap.

There is also a system routine GetCurrentThreadStackLimits, which returns the start and end memory address of the stack.

I am not sure of the default stack size for FTN95 or FTN95 /64, but for /64 the default probably should be much larger than is presently provided. The stack size is provided by the linker. ( I did a test that indicated 50 mb but I would not guarantee this is correct )

I would guess that any stack size could be limited to below 4gb, but again I don't know.

The safe approach is to always place large arrays on the heap, using ALLOCATE. The 'heap' must be extendable for this to work.

25 Jul 2018 1:53 #22399

If the linker assigns a stack size, then to get a stack overflow means that the linker stack allocation algorithm does not correctly determine the size of stack required, and whether this is the result of an error, or simply that the linker cannot resolve an ambiguity (e.g. the stack size required may vary considerably from run to run), then Dan’s problem will recur from time to time.

From the information provided in a typical case, I doubt that anyone is able to determine what precisely the underlying cause is, just that symptoms are displayed. It strikes me that while the stack allocation algorithm remains a commercial secret, no user can resolve the underlying cause for themselves. One answer might be to publish the algorithm.

Another answer is to have a user defined stack, which one can progressively enlarge until the problem goes away, although it would be helpful to know what stack is assigned automatically to start from there.

My copy of FTN95.CHM tells me the following regarding stack_size as an SLINK64 directive: stack_size <hex number> Specifies the stack size. The default value is 0x1000000 (16 MB). (John Campbell to note the above)

And also on a command line: Also /stack can be included followed by the stack size as a number of megabytes. /map can also be used in this context.

Does /map tell you the stack size the linker has given you? I'm too lazy to see for myself.

Or, you just go very old fashioned and make sure the stack isn’t used much.

I would have thought that Dan’s problem is resolvable by making the stack very much bigger and ignoring the default or anything the linker determines.

Eddie

27 Jul 2018 3:15 (Edited: 28 Jul 2018 10:37) #22401

Eddie,

Thanks for your comments. I have reviewed the ftn95.chm. My understanding is 'the linker stack allocation algorithm' does not exist. It just allocates a fixed size, which apparently defaults to 16 Mb for /64.

For 32-bit, I am very confused, as it defines a reserve (max size) of 50 Mb and a commit of only 16Kb. Not sure how this 'commit' works, but presumably the reserve is the main size limit.

So, why is /64 documented as only 16 Mb ?

Paul,

Could you comment on the following questions.

I tried to write a program that tests the available stack, by allocating larger arrays, using ALLOCATE or calling a subroutine to use an automatic (local) array. My stack size test is failing as my large local arrays 'appear' to be going on another heap. Is there a way (compile or link option) to control where these local arrays go (heap or stack) ?

Does /map tell you the stack size the linker has given you? It would be useful to know what it is, ie see when the 'algorithm' changes the size.

My test program now confirms 50Mb stack for 32-bit but does not work effectively for /64 stack testing. For 64-bit, large local arrays have different address area to the heap arrays or the local stack variables. ( I shall post link when I resolve the problem )

Edit: Paul, for /64 I only want to know how to send large local arrays to the stack, as a compile option (/stack_arrays ?). I am not recommending this as a good approach, as the default to send to a larger memory area is a much better default, like other 64-bit compilers.

27 Jul 2018 6:46 #22402

The below link is for those who want to see the stack test program as is. Running the program reports the memory address of two arrays + id which is a local variable on the stack. The memory addresses change for automatic/local or allocate/heap arrays. With 32-bit, this is as expected and the program terminates at 51 Mb. With /64, it does not terminate, but the automatic arrays (kk=2) have a different address to both the heap (kk=1) or the stack.

https://www.dropbox.com/s/5kfaugf5efvtm1t/eval_stream.f90?dl=0

I have adapted this test from another test to identify local thread stack usage for multi-thread code, where I was looking at the performance of stack and heap arrays. The results so far is some improvement for small arrays on the stack, but no difference between heap vs stack for large arrays. My conclusion is heap arrays (allocate) offer a much more robust solution, although small stack arrays (cached) can have a benefit in some cases. No clear winner in heap vs stack debate !

27 Jul 2018 12:08 #22403

Yes John,

/map does tell you the stack size. For a trivial little program, in 32 bit mode, the map report is long and tells you the stack size. With /64, the report is more cryptic, shorter, and doesn't.

Eddie

1 Aug 2018 5:26 #22413

Looks like this error happened as a result of arrays hitting RAM+paging size limit. Mecej4's example also reports stack fault.

Need 128 or even 256GB RAM. And now it's the worst time for purchasing memory, sales/middlemen pushed its prices beyond any shame. They have to fall per GB but instead they only grow and grow. Same with processors, for example 4.3 billion transistors Apple A11 chip made with latest tech norms costs $25 while similar transistor count Intel Xeon made by ancient tech norms costs 50x more. It's an example of ultimate monopoly in its ugly shape.

1 Aug 2018 6:05 #22414

Dan,

These are big numbers, even in 2018. I would be checking that the O/S and processor support the memory install you are considering. My latest (and greatest) I7-8700K supports only 64 GB physical memory. I recall we needed Windows Server ?? to install 96 GB memory on an old and very slow Xeon (not my software). I have 32 GB installed and don't use more than 25 GB.

If you are going to use virtual memory then the paging file must support the memory image, so you would need a 256 GB SSD, but I expect the performance would not be too good, although can be quicker than developing disk based solution.

1 Aug 2018 7:28 (Edited: 6 Aug 2018 10:07) #22415

Good points John.

Your 8700K supports 2 memory channels and hence has 4 slots for RAM. The 32 GB RAM sticks were not yet available for consumer market at the time when 8700K came for sale. The motherboard manufacturers also did not check such RAM.

The AMD processors have 4 channels and even 6 channels if I am not mistaken, Intel Xeons have at least 4. So you can install 2-3 times more RAM there immediately. Additionally they typically are less RAM restrictive on limit from start and future upgrades of motherboard BIOS will definitely allow larger size memory sticks.

As to Window 10, it supports 2TB RAM (Home edition 128 GB)

Curious if number of memory channels is critical for particular case of linear algebra as its speed is memory bandwidth bound in case of large matrix sizes. Tests show that there is not much difference for all other tasks. If you or anyone here follows AnandTech, try to help me to convince Ian Cutress to check this.

3 Aug 2018 9:14 #22435

John

I hope to be able to respond to your questions early next week.

6 Aug 2018 9:18 #22445

Here is a link to an updated page of the FTN95 help file. It provides further information on the current state of FTN95/SLINK64 memory allocation.

https://www.dropbox.com/s/m4hoy6szmj0sxkl/info.htm?dl=0

6 Aug 2018 10:56 #22447

John

In answer to your specific questions:

  1. Local arrays can only go on the stack. Its default size is 32MB and in theory this can be increased to 4GB via the SLINK64 option 'stack_size'.

  2. Automatic arrays use a different 'stack' as described in the above download.

  3. The SLINK64 /map does not appear to give the stack size but this, in any case, would just be the default value or the value you supply for 'stack_size'.

7 Aug 2018 12:59 #22449

Paul,

Thanks for the advice. The updated info does describe what is happening for 64-bit.

7 Aug 2018 11:09 #22450

Why on earth anyone needs to know and continue to worry about existence of stack in 64bits? It is a total rudiment and needs to be pesticided from Fortran. Why stack existed even in 32bits? Who the hell introduced it in first place and for what purpose? Anyone gained anything from existence of this annoyance?

I'd excused stack existence if it would guarantee that some specific part of the code would be always kept in processor cache level3, or level2 or even level1 but this is clearly not the case, the dumb blind stack has no such options. So make it RIP, Silverfrost

8 Aug 2018 1:34 #22452

Without a stack, recursion would be near impossible. I remember writing a 're-entrant' routine years ago as part of my CS degree. This predated stack oriented processors, so we had to handle the stack in the assembly code. PITA. While one can write re-entrant or recursive routines without a hardware stack, the problem of where to return control when complete can get really sticky. Or, one writes a recursive routine as simple code, handling the entry/exit within the program module as if it were being called. It can be done but doing so will obfuscate the code.

All the GUI interfaces that you and I use AND OpenGL REQUIRE a stack architecture because they are mostly written in 'C' AND the processor architecture is stack-oriented so they take advantage of it. Like temporaries on the stack. Blazingly fast allocation because there really isn't any (one just adjusts the stack pointer and uses it as the addressing), and recursion/re-entrancy are built-in.

If you wish to RIP the stack, you also RIP using any SF software. Don't think that's what you are after.....

8 Aug 2018 1:34 #22453

Without a stack, recursion would be near impossible. I remember writing a 're-entrant' routine years ago as part of my CS degree. This predated stack oriented processors, so we had to handle the stack in the assembly code. PITA. While one can write re-entrant or recursive routines without a hardware stack, the problem of where to return control when complete can get really sticky. Or, one writes a recursive routine as simple code, handling the entry/exit within the program module as if it were being called. It can be done but doing so will obfuscate the code.

All the GUI interfaces that you and I use AND OpenGL REQUIRE a stack architecture because they are mostly written in 'C' AND the processor architecture is stack-oriented so they take advantage of it. Like temporaries on the stack. Blazingly fast allocation because there really isn't any (one just adjusts the stack pointer and uses it as the addressing), and recursion/re-entrancy are built-in.

If you wish to RIP the stack, you also RIP using any SF software. Don't think that's what you are after.....

8 Aug 2018 6:31 #22455

In theory it might be possible for FTN95 to give users the option of putting local and automatic arrays on the heap rather than the stack. If it is then it would not be a short/simple task so we are back to the question of priorities and general interest.

In the meantime very large local and automatic arrays must be converted in your code to take the form of dynamic arrays (using ALLOCATE).

It is a mistake to assume that you can simply take old code, convert it to 64 bits and then increase the array sizes without constraint. There will be limitations from your hardware, from the operating system, and to some extent from the compiler.

Please login to reply.