Topic: changing the stack at runtime? in Support

KennyT

Posts: 320

Back to Top

25 May 2018 7:47 #22162

i think it's true to say that, in an application that uses DLLs, it's only necessary to change the stack value in the link script for the .exe to overcome a stack overflow.

our application has two processes with conflicting stack needs. one needs a small stack (5mb) to avoid grabbing too much memory (with the default value of 50Mb, this process runs out of memory at about 2Gb, with stack set to 5Mb it runs up to 3.5Gb). The other process requires at least 20Mb stack or it gets a stack overflow.

So the question is, can i override the stack at runtime using a switch or do i have to provide different .exe files and ask the users to run the 'other' one if/when they get an error?

K

PaulLaidler

Posts: 7975 Salford, UK

Back to Top

25 May 2018 9:25 #22163

Kenny

The stack size can only be set when linking.

wahorger

Posts: 1274 Morrison, CO, USA

Back to Top

25 May 2018 4:12 #22164

So, this is a 64-bit application, right? Just curious...

KennyT

Posts: 320

Back to Top

25 May 2018 4:38 #22165

no, 32-bit at the moment.

64-bit to come later.

K

JohnCampbell

Posts: 2526 Sydney

Back to Top

25 May 2018 10:16 #22166

Can we have a new page in ftn95.chm something like 'Stack size for dummies', or more 'How to manage the stack'

We need actual examples of:

how to reset the stack when using Slink or Slink64
how to find out how big the stack has been set
how to interrogate a .exe as to what the stack size is
how to read a .map file to see what the stack size is

(can these features be included in Slink and Slink64, if not available)

Using the stack is probably the most uncertain aspect of program development for all but a few FTN95 users. My approach has always be to try and not use the stack, by not using local or automatic arrays. I use ALLOCATE for any large array.

ps: If this works out, we could then consider another helpful page on 'how to use common and modules with DLLs'

PaulLaidler

Posts: 7975 Salford, UK

Back to Top

26 May 2018 6:39 #22168

Thanks John. I will make a note of your suggestion.

silicondale

Posts: 248 Matlock, Derbyshire, UK

Back to Top

2 Apr 2019 2:36 #23417

I just hit a stack size problem (running 32-bit). Allocatable array size 35,000,000 (REAL*8) blows the stacl. Error message says I need to re-link, but I don't have the SLINK documentation handy (I'm at a project site in deepest rural Portugal). Can i just re-link (what command?) or is there some other solution?

PaulLaidler

Posts: 7975 Salford, UK

Back to Top

2 Apr 2019 4:03 #23418

There is an online link here...

https://silverfrost.com/ftn95-help/slink/interactive_mode.aspx

silicondale

Posts: 248 Matlock, Derbyshire, UK

Back to Top

2 Apr 2019 9:14 #23419

Many thanks!

JohnCampbell

Posts: 2526 Sydney

Back to Top

3 Apr 2019 4:48 #23420

Quoted from silicondale I just hit a stack size problem (running 32-bit). Allocatable array size 35,000,000 (REAL*8) blows the stacl.

Allocatable arrays do not go on the stack (they go on the heap), so I am not sure of your diagnosis of the problem.

Your allocatable arrays are 280 Mbyte, which should not be a major issue. If you are running on a 64-bit OS with high 32-bit memory usage, these could be allocated above 2gb address. If you do have a stack overflow, check for local or automatic arrays or possible temporary arrays, say due to array sections. This problem can be overcome by making these arrays allocatable.

silicondale

Posts: 248 Matlock, Derbyshire, UK

Back to Top

5 Apr 2019 5:34 #23448

Thanks John - I think you're right, and the problem had nothing to do with the allocatable arrays. I found a workaround for that one and will come back to it later.

I now have another problem. Allocatable REAL*8 array, dimension 85 million, and getting message that allocation failed. This is 680 megabytes - is that too big? It could well be too big anyway, as I'm trying to do a sort, and I want it to finish within my lifetime ! It's a set of timestamps from a robot that is returning huge volumes of data - and some of the timestamps are coming back out of sequence. I think the practical solution is probably to filter the data and work on a small subset for now.

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

5 Apr 2019 11:39 #23450

Johns suggestion will give you almost 1 GB more with /3GB. But without any changes to the code try first playing with the stack allocation number like in my case 950000000 (here stack is almost 1 GB)

slink subs0.obj subs1.obj subs2.lib subs3.dll /stack:950000000 /3gb >linkReport__

Think also moving to /64 keeping /32 alive for independent debugging hints

mecej4

Posts: 1912

Back to Top

6 Apr 2019 2:56 #23451

Quoted from silicondale I now have another problem. Allocatable REAL*8 array, dimension 85 million, and getting message that allocation failed. This is 680 megabytes - is that too big? It could well be too big anyway, as I'm trying to do a sort, and I want it to finish within my lifetime ! It's a set of timestamps from a robot that is returning huge volumes of data - and some of the timestamps are coming back out of sequence. I think the practical solution is probably to filter the data and work on a small subset for now.

'Some of the timestamps are out of sequence' -- that is before sorting, I presume? If not, the sort algorithm/implementation has a bug. If only a small number of timestamps in the input are out of sequence, the first thing to try is to divert those into a second buffer or file, sort those, and merge the result with the bulk of the data that is already in sequence.

If you provide some details and examples of the data, perhaps we could help.

silicondale

Posts: 248 Matlock, Derbyshire, UK

Back to Top

6 Apr 2019 6:35 #23452

Indeed so - before sorting. That is the reason the sort is necessary. And yes indeed, one option is to divert the out-of-order timestamps into a separate file, then sort that and merge it back. That's effectively what a merge-sort would do, and is the workaround that I could adopt. At the cost of writing another (and fairly simple) program to do this. Certainly would avoid the need for such a huge allocatable array ! Good thinking, mecej !

JohnCampbell

Posts: 2526 Sydney

Back to Top

6 Apr 2019 7:25 #23453

I have done some testing of 85 million values on 2.8ghz i5.

If the numbers are only slightly out of sequence: do i = 1,n call random_number(real_array(i)) real_array(i) = real_array(i) + real(i))*0.125 end do This results in quick sort of 4 seconds, while bubble sort of 2 seconds. (shell takes 4 seconds also)

If the values are random ( call random_number(real_array) ) my quick sort takes about 50 seconds. My bubble and shell sorts are still running after 2 hours 50 minutes !!

For large N, O(n)^2 is much slower than O(n).log(n), which shows how much better a more suited algorithm can be.

John

I will post a link to the test program when the shell sort finishes or I give up !!

silicondale

Posts: 248 Matlock, Derbyshire, UK

Back to Top

6 Apr 2019 8:01 #23454

Many thanks for taking the time to test this, John. I never got the 85 million sorted within FTN95 because of the array allocation problem. The workaround was to export the file to an SQLite database (which I'm using as the reference database in any case) and then sorting using an SQL command with ORDER BY to produce a view which can then be re-imported to the FTN95 environment.

Of course the SQL sort algorithm is itself a black box - I've no idea what method it uses, but it took several minutes to sort, so almost certainly not the most efficient! I think that only a relatively small proportion of the data are out of sequence, so the suggestion from mecej4 may be the quickest solution - just throw the out-of sequence records into a separate much smaller file, sort that (fast) and then ninterleave it back into the full 85-million data set. Effectively a customised merge-sort.

However, as it is likely that I'm going to have some more gigantic files to deal with, that need sorting on axes other than time, I think I need to check out your DSORT algorithm as a replacement for my tired old shell-sort. Next challenge is going to be in a new project where I need to supply (and retrieve) time-sorted data for a simulation exercise that uses GPSS/H. That's a language and style of computing that is going to be a learning experience in itself. Still, it keeps the brain active - hopefully staves off the dementia for a while longer!

JohnCampbell

Posts: 2526 Sydney

Back to Top

6 Apr 2019 12:34 #23457

Steve,

The 85 million random order sort has now been running for 7.5 hours. The delay is the bubble sort performance; which I have now stopped.

The following link is to the latest test program to demonstrate possible performance of sorting a large set of numbers.

The program : allocates a large real8 array and an integer4 index array ( as all sorts tested are indexed sorts) 4 possible sorts are provided to test their suitability for random or partially random lists of numbers:

dsort is a quick sort, which is probably the most suitable sort iheap_sort shell sort, bubble sort (dsort@ fails as the list is too big; the split list is probably too small)

Options are provided via variables in the program: list_size = size of array to sort near_ordered = false for a random set of numbers, or true if the numbers are conditioned. random_band = the nominal bandwidth of the bubble sort, which controls a combination of ordered and random values, ie to represent how much the time variables would be out of order.

To select a different option, the variables in the program must be changed and the program rebuilt.

ftn95_test.bat builds and runs the program, appending the results to time_test.log

You can download the .zip file from my dropbox: https://www.dropbox.com/s/ch85efojh9ysax7/time_test_4.zip?dl=0

The results show: dsort (quick sort) is by far the best it takes about 50 seconds for a random list of 85 mil values and about 5 seconds for a fairly ordered set. Then comes heap and shell, but unless the numbers are extremely ordered, they are not worth considering. Bubble sort is the worst for random order; it takes hours, but if the numbers are extremely ordered, say out of place by a few positions, then it can be the best.

Anyway, this program demonstrates: Allocation of arrays populating the arrays (random values and initialise the index) sorting the array testing the sort (timing the test)

It can run as 32-bit or /64 and doesn't use the stack

(Oddly enough) I would recommend to read the 85 mil values into memory and do a quick sort. It would take about 50 seconds, which is probably not very slow, given the time it takes to read the values from file and convert the text to real*8 values (see recent posts on file reading speeds ?) This should be much simpler than splitting out an un-ordered list, sorting this list and then re-merging.

Hopefully it provides some ideas.

John

silicondale

Posts: 248 Matlock, Derbyshire, UK

Back to Top

6 Apr 2019 6:38 #23460

Many thanks for all this, John! In the past I haven't paid much attention to sorting, just used shell sort for smaller data sets, and merge sort for larger. Avoided bubble sort like the plague. And otherwise tried to avoid the necessity.

mecej4

Posts: 1912

Back to Top

7 Apr 2019 2:31 #23461

... read the 85 mil values into memory and do a quick sort. It would take about 50 seconds, which is probably not very slow ...

We can probably decrease the time by one order of magnitude. Arne Maus provides several parallel sorting algorithms at http://heim.ifi.uio.no/~arnem/sorting/ . Using his ParaMergeTest.java, for instance, I found that sorting 80 million integers on a laptop with an i7-2720 CPU (4 cores, 8 threads, 8 years old) took 2.5 seconds. The standard Java routine Arrays.sort took about 10 seconds; JohnC's Dsort took about 6 seconds when run with Ifort 19.

If Steve's timestamps are 32-bit integers, as Unix timestamps are, this time estimate applies directly; the value may need to be multiplied by an appropriate factor if longer timestamps are used and/or associated indexes need to be sorted.

silicondale

Posts: 248 Matlock, Derbyshire, UK

Back to Top

7 Apr 2019 6:43 #23463

hi mecej - my timestamps are Unix plus. For some reason our robot engineers decided they needed nanosecond precision, so we have horrendous 19-digit timestamps. I have reduced these by extracting microseconds within the current day, which are then 32-bit integers. Definitely worth trying the Arne Maus algorithm as well. New project starting up in June, which will probably need some fast sorts!