forums.silverfrost.com

silicondale · Posted: Sat Apr 06, 2019 9:01 am Post subject:

Many thanks for taking the time to test this, John. I never got the 85 million sorted within FTN95 because of the array allocation problem. The workaround was to export the file to an SQLite database (which I'm using as the reference database in any case) and then sorting using an SQL command with ORDER BY to produce a view which can then be re-imported to the FTN95 environment.

Of course the SQL sort algorithm is itself a black box - I've no idea what method it uses, but it took several minutes to sort, so almost certainly not the most efficient! I think that only a relatively small proportion of the data are out of sequence, so the suggestion from mecej4 may be the quickest solution - just throw the out-of sequence records into a separate much smaller file, sort that (fast) and then ninterleave it back into the full 85-million data set. Effectively a customised merge-sort.

However, as it is likely that I'm going to have some more gigantic files to deal with, that need sorting on axes other than time, I think I need to check out your DSORT algorithm as a replacement for my tired old shell-sort. Next challenge is going to be in a new project where I need to supply (and retrieve) time-sorted data for a simulation exercise that uses GPSS/H. That's a language and style of computing that is going to be a learning experience in itself. Still, it keeps the brain active - hopefully staves off the dementia for a while longer!
_________________
(Steve Henley)

JohnCampbell · Joined: 16 Feb 2006 Posts: 2618 Location: Sydney

Steve,

The 85 million random order sort has now been running for 7.5 hours. The delay is the bubble sort performance; which I have now stopped.

The following link is to the latest test program to demonstrate possible performance of sorting a large set of numbers.

The program :
allocates a large real*8 array and an integer*4 index array ( as all sorts tested are indexed sorts)
4 possible sorts are provided to test their suitability for random or partially random lists of numbers:

dsort is a quick sort, which is probably the most suitable sort
iheap_sort
shell sort,
bubble sort
(dsort@ fails as the list is too big; the split list is probably too small)

Options are provided via variables in the program:
list_size = size of array to sort
near_ordered = false for a random set of numbers, or true if the numbers are conditioned.
random_band = the nominal bandwidth of the bubble sort, which controls a combination of ordered and random values, ie to represent how much the time variables would be out of order.

To select a different option, the variables in the program must be changed and the program rebuilt.

ftn95_test.bat builds and runs the program, appending the results to time_test.log

You can download the .zip file from my dropbox:
https://www.dropbox.com/s/ch85efojh9ysax7/time_test_4.zip?dl=0

The results show:
dsort (quick sort) is by far the best it takes about 50 seconds for a random list of 85 mil values and about 5 seconds for a fairly ordered set.
Then comes heap and shell, but unless the numbers are extremely ordered, they are not worth considering.
Bubble sort is the worst for random order; it takes hours, but if the numbers are extremely ordered, say out of place by a few positions, then it can be the best.

Anyway, this program demonstrates:
Allocation of arrays
populating the arrays (random values and initialise the index)
sorting the array
testing the sort
(timing the test)

It can run as 32-bit or /64 and doesn't use the stack

(Oddly enough) I would recommend to read the 85 mil values into memory and do a quick sort. It would take about 50 seconds, which is probably not very slow, given the time it takes to read the values from file and convert the text to real*8 values (see recent posts on file reading speeds ?)
This should be much simpler than splitting out an un-ordered list, sorting this list and then re-merging.

Hopefully it provides some ideas.

John

silicondale · Posted: Sat Apr 06, 2019 7:38 pm Post subject:

Many thanks for all this, John! In the past I haven't paid much attention to sorting, just used shell sort for smaller data sets, and merge sort for larger. Avoided bubble sort like the plague. And otherwise tried to avoid the necessity.
_________________
(Steve Henley)

mecej4 · Joined: 31 Oct 2006 Posts: 1899

silicondale · Posted: Sun Apr 07, 2019 7:43 am Post subject:

hi mecej - my timestamps are Unix plus. For some reason our robot engineers decided they needed nanosecond precision, so we have horrendous 19-digit timestamps. I have reduced these by extracting microseconds within the current day, which are then 32-bit integers. Definitely worth trying the Arne Maus algorithm as well. New project starting up in June, which will probably need some fast sorts!
_________________
(Steve Henley)

JohnCampbell · Joined: 16 Feb 2006 Posts: 2618 Location: Sydney

Steve,

Before looking for a faster sort, I would be interested to know:
1) how long it takes to read the 85m timestamps and convert them to "microseconds within the current day".
2) Then you can sort them.
3) Then what do you do with them ?

My interpretation of the sort problem description shows that for the timestamps, how close they are to ordered could influence which sort to choose, although quick_sort is usually a safe choice.

This is an interesting complimentary problem to the earlier forum post about file reading speeds, as it is important to know what the total processing time is and where to target effort for improved performance.

Look forward to hearing how the project evolves.

silicondale · Posted: Tue Apr 16, 2019 2:54 pm Post subject:

Hi John - sorry for much delayed reply. The Portugal robot trials finished. Some big problems with the robots which the engineers are hurriedly fixing before next trials in the uk during May. But I tracked down my own problem which was nothing to do with memory but was a basic array allocation error. I found that one of the allocatable arrays was getting a zero dimension passed to it. Instead of failing at the time of allocation it was blowing up when I tried to use it.

Reading and converting the 85m 19-digit timestamps is actually very fast. I didn't time it, but it was only a few seconds. The timestamps on inut were held as a string, and data was extracted from them using a simple formatted read. Not just the microseconds within the day but also the conventional date and time (as strings). These data are points extracted (within the robot) from laser stripes produced by a structured-light system, and almost but not entirely in timestamp order.

They they need to be joined with navigation data (which thankfully are in timestamp order) which can be used to convert the points from robot coordinates to absolute 'world' coordinates, using a standard quaternion transformation.
_________________
(Steve Henley)

JohnCampbell · Joined: 16 Feb 2006 Posts: 2618 Location: Sydney

Hi Steve,

I have improved my sort testing and reporting, producing some charts of speed vs sample size for degree of randomness (based on max out of position bandwidth) These results basically show that adapted bubble (2.1 sec) and insert (1.2 sec) sorts can work very well for data that is near ordered (max bandwidth of < 16) while quick is 2.6 sec for 85m. For random data, quick sort is the most reliable (18 sec for 85m) while bubble is 1,000 sec for only 850k and insert is 250 sec for 850k sample size.
However, I find that quick sort is by far the best. There have been reports that quick sort does not work well for some initial ordered states (ordered or reverse ordered) but the sort I have does not show this problem.
The use of multi-threading for sorting is an interesting approach, but might only be necessary for projects requiring unusually fast times.

What would be good to see is some comparative time of the stages of the processing (assuming it is staged), as there has been a lot of comment in this forum on the speed of some stages, especially disk reading and writing of large files. I have always pushed the idea that this performance should be seen in the context of all stages of the processing.
I could suggest you may have the following stages:
* Obtain data from the recording device and store on local disk
* Read data into memory
* obtain a time stamp for each record
* generate a sorted index for the data records
* process these records
* write processed data for future review

Whatever the stages, it would be good to know if reading data at 200 megabytes per second (rather than say 1+ gigabytes/second) would have a significant impact on the total analysis task.

For the marine surveys I have analysed in the past (data sets about a gigabyte), I have found that step 1, combined with step 1a: review/filter/correct the data for errors and estimate a range of sensible values is by far the most time consuming stage. (this could take hours > days > weeks, such as when the survey datum surface had to redefined; mean sea level is not a flat surface !)

Then again, I could imagine that when processing a continuous feed of digital data that the slowest stage is going to be critical, although I have not worked in a field where high speed data supply occurs (radio astronomy?)

It would be good to see some real performance measures and understand the relative performance requirements for sorting, reading and processing.

John

(link to test results shortly)
Sorry for delay in posting.
The linked .zip file includes the sort test .f90 code, batch file to run, .xls file of imported results and a .doc file which has charts and some description of results.
To comment on mecej4's comments below:
The latest tests includes two types of insertion sorts; binary search and reverse linear search which is good for very ordered data. Neither are good for random data.
The quick sort coded includes a bubble or a linear insert sort when sort packet size < 18. This improves speed by about 10% for the small increase in complexity.
I hope these results give some context to the comments.

https://www.dropbox.com/s/oae3tcpq29tdic4/sort_test2.zip?dl=0

silicondale · Posted: Wed Apr 17, 2019 8:51 am Post subject:

Hi John -
We are finding that by far the slowest part of our processing is the data acquisition - transfer from the robot or from a central server. We are now using USB3.1 and a 2-terabyte SSD which is a lot faster than the external hard drives that are our main archival storage medium, and using a hard-wired ethernet link to the submersible instead of the wifi we had used earlier. Data volumes per bag-file are typically in the tens of gigabytes. The next slowest part of the operation is extracting data from the ROS bag files. Rather than relying on the robot engineers to do this for us, we now use VirtualBox and our own python scripts to extract CSV files and PNG images. It's only when we get to see the data in an intelligible format that the Fortran processing comes into play - and is relatively fast. Following your comments I might well replace our shell-sort by quick-sort: will do some time tests, but it's already apparent that this is far from the slowest part of the process.

The Fortran step includes import of pixel-by-pixel data from multispectral images. I then pass the data on to our 3d modelling whiz-kid who projects the multispectral data on to the point cloud (he does everything in C++ of course because they don't teach Fortran in universities any more). We haven't yet got to do that stage though, because of a number of practical (engineering) problems in getting a reliable point cloud.
_________________
(Steve Henley)

mecej4 · Joined: 31 Oct 2006 Posts: 1899