forums.silverfrost.com Forum Index forums.silverfrost.com
Welcome to the Silverfrost forums
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

changing the stack at runtime?
Goto page Previous  1, 2
 
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> Support
View previous topic :: View next topic  
Author Message
silicondale



Joined: 15 Mar 2007
Posts: 243
Location: Matlock, Derbyshire, UK

PostPosted: Sat Apr 06, 2019 9:01 am    Post subject: Reply with quote

Many thanks for taking the time to test this, John. I never got the 85 million sorted within FTN95 because of the array allocation problem. The workaround was to export the file to an SQLite database (which I'm using as the reference database in any case) and then sorting using an SQL command with ORDER BY to produce a view which can then be re-imported to the FTN95 environment.

Of course the SQL sort algorithm is itself a black box - I've no idea what method it uses, but it took several minutes to sort, so almost certainly not the most efficient! I think that only a relatively small proportion of the data are out of sequence, so the suggestion from mecej4 may be the quickest solution - just throw the out-of sequence records into a separate much smaller file, sort that (fast) and then ninterleave it back into the full 85-million data set. Effectively a customised merge-sort.

However, as it is likely that I'm going to have some more gigantic files to deal with, that need sorting on axes other than time, I think I need to check out your DSORT algorithm as a replacement for my tired old shell-sort. Next challenge is going to be in a new project where I need to supply (and retrieve) time-sorted data for a simulation exercise that uses GPSS/H. That's a language and style of computing that is going to be a learning experience in itself. Still, it keeps the brain active - hopefully staves off the dementia for a while longer!
_________________
(Steve Henley)
Back to top
View user's profile Send private message Visit poster's website
JohnCampbell



Joined: 16 Feb 2006
Posts: 2554
Location: Sydney

PostPosted: Sat Apr 06, 2019 1:34 pm    Post subject: Reply with quote

Steve,

The 85 million random order sort has now been running for 7.5 hours. The delay is the bubble sort performance; which I have now stopped.

The following link is to the latest test program to demonstrate possible performance of sorting a large set of numbers.

The program :
allocates a large real*8 array and an integer*4 index array ( as all sorts tested are indexed sorts)
4 possible sorts are provided to test their suitability for random or partially random lists of numbers:

dsort is a quick sort, which is probably the most suitable sort
iheap_sort
shell sort,
bubble sort
(dsort@ fails as the list is too big; the split list is probably too small)

Options are provided via variables in the program:
list_size = size of array to sort
near_ordered = false for a random set of numbers, or true if the numbers are conditioned.
random_band = the nominal bandwidth of the bubble sort, which controls a combination of ordered and random values, ie to represent how much the time variables would be out of order.

To select a different option, the variables in the program must be changed and the program rebuilt.

ftn95_test.bat builds and runs the program, appending the results to time_test.log

You can download the .zip file from my dropbox:
https://www.dropbox.com/s/ch85efojh9ysax7/time_test_4.zip?dl=0

The results show:
dsort (quick sort) is by far the best it takes about 50 seconds for a random list of 85 mil values and about 5 seconds for a fairly ordered set.
Then comes heap and shell, but unless the numbers are extremely ordered, they are not worth considering.
Bubble sort is the worst for random order; it takes hours, but if the numbers are extremely ordered, say out of place by a few positions, then it can be the best.

Anyway, this program demonstrates:
Allocation of arrays
populating the arrays (random values and initialise the index)
sorting the array
testing the sort
(timing the test)

It can run as 32-bit or /64 and doesn't use the stack

(Oddly enough) I would recommend to read the 85 mil values into memory and do a quick sort. It would take about 50 seconds, which is probably not very slow, given the time it takes to read the values from file and convert the text to real*8 values (see recent posts on file reading speeds ?)
This should be much simpler than splitting out an un-ordered list, sorting this list and then re-merging.

Hopefully it provides some ideas.

John
Back to top
View user's profile Send private message
silicondale



Joined: 15 Mar 2007
Posts: 243
Location: Matlock, Derbyshire, UK

PostPosted: Sat Apr 06, 2019 7:38 pm    Post subject: Reply with quote

Many thanks for all this, John! In the past I haven't paid much attention to sorting, just used shell sort for smaller data sets, and merge sort for larger. Avoided bubble sort like the plague. And otherwise tried to avoid the necessity.
_________________
(Steve Henley)
Back to top
View user's profile Send private message Visit poster's website
mecej4



Joined: 31 Oct 2006
Posts: 1885

PostPosted: Sun Apr 07, 2019 3:31 am    Post subject: Reply with quote

Quote:
... read the 85 mil values into memory and do a quick sort. It would take about 50 seconds, which is probably not very slow ...


We can probably decrease the time by one order of magnitude. Arne Maus provides several parallel sorting algorithms at http://heim.ifi.uio.no/~arnem/sorting/ . Using his ParaMergeTest.java, for instance, I found that sorting 80 million integers on a laptop with an i7-2720 CPU (4 cores, 8 threads, 8 years old) took 2.5 seconds. The standard Java routine Arrays.sort took about 10 seconds; JohnC's Dsort took about 6 seconds when run with Ifort 19.

If Steve's timestamps are 32-bit integers, as Unix timestamps are, this time estimate applies directly; the value may need to be multiplied by an appropriate factor if longer timestamps are used and/or associated indexes need to be sorted.
Back to top
View user's profile Send private message
silicondale



Joined: 15 Mar 2007
Posts: 243
Location: Matlock, Derbyshire, UK

PostPosted: Sun Apr 07, 2019 7:43 am    Post subject: Reply with quote

hi mecej - my timestamps are Unix plus. For some reason our robot engineers decided they needed nanosecond precision, so we have horrendous 19-digit timestamps. I have reduced these by extracting microseconds within the current day, which are then 32-bit integers. Definitely worth trying the Arne Maus algorithm as well. New project starting up in June, which will probably need some fast sorts!
_________________
(Steve Henley)
Back to top
View user's profile Send private message Visit poster's website
JohnCampbell



Joined: 16 Feb 2006
Posts: 2554
Location: Sydney

PostPosted: Sun Apr 07, 2019 10:10 am    Post subject: Reply with quote

Steve,

Before looking for a faster sort, I would be interested to know:
1) how long it takes to read the 85m timestamps and convert them to "microseconds within the current day".
2) Then you can sort them.
3) Then what do you do with them ?

My interpretation of the sort problem description shows that for the timestamps, how close they are to ordered could influence which sort to choose, although quick_sort is usually a safe choice.

This is an interesting complimentary problem to the earlier forum post about file reading speeds, as it is important to know what the total processing time is and where to target effort for improved performance.

Look forward to hearing how the project evolves.
Back to top
View user's profile Send private message
silicondale



Joined: 15 Mar 2007
Posts: 243
Location: Matlock, Derbyshire, UK

PostPosted: Tue Apr 16, 2019 2:54 pm    Post subject: Reply with quote

Hi John - sorry for much delayed reply. The Portugal robot trials finished. Some big problems with the robots which the engineers are hurriedly fixing before next trials in the uk during May. But I tracked down my own problem which was nothing to do with memory but was a basic array allocation error. I found that one of the allocatable arrays was getting a zero dimension passed to it. Instead of failing at the time of allocation it was blowing up when I tried to use it.

Reading and converting the 85m 19-digit timestamps is actually very fast. I didn't time it, but it was only a few seconds. The timestamps on inut were held as a string, and data was extracted from them using a simple formatted read. Not just the microseconds within the day but also the conventional date and time (as strings). These data are points extracted (within the robot) from laser stripes produced by a structured-light system, and almost but not entirely in timestamp order.

They they need to be joined with navigation data (which thankfully are in timestamp order) which can be used to convert the points from robot coordinates to absolute 'world' coordinates, using a standard quaternion transformation.
_________________
(Steve Henley)
Back to top
View user's profile Send private message Visit poster's website
JohnCampbell



Joined: 16 Feb 2006
Posts: 2554
Location: Sydney

PostPosted: Wed Apr 17, 2019 2:18 am    Post subject: Reply with quote

Hi Steve,

I have improved my sort testing and reporting, producing some charts of speed vs sample size for degree of randomness (based on max out of position bandwidth) These results basically show that adapted bubble (2.1 sec) and insert (1.2 sec) sorts can work very well for data that is near ordered (max bandwidth of < 16) while quick is 2.6 sec for 85m. For random data, quick sort is the most reliable (18 sec for 85m) while bubble is 1,000 sec for only 850k and insert is 250 sec for 850k sample size.
However, I find that quick sort is by far the best. There have been reports that quick sort does not work well for some initial ordered states (ordered or reverse ordered) but the sort I have does not show this problem.
The use of multi-threading for sorting is an interesting approach, but might only be necessary for projects requiring unusually fast times.

What would be good to see is some comparative time of the stages of the processing (assuming it is staged), as there has been a lot of comment in this forum on the speed of some stages, especially disk reading and writing of large files. I have always pushed the idea that this performance should be seen in the context of all stages of the processing.
I could suggest you may have the following stages:
* Obtain data from the recording device and store on local disk
* Read data into memory
* obtain a time stamp for each record
* generate a sorted index for the data records
* process these records
* write processed data for future review

Whatever the stages, it would be good to know if reading data at 200 megabytes per second (rather than say 1+ gigabytes/second) would have a significant impact on the total analysis task.

For the marine surveys I have analysed in the past (data sets about a gigabyte), I have found that step 1, combined with step 1a: review/filter/correct the data for errors and estimate a range of sensible values is by far the most time consuming stage. (this could take hours > days > weeks, such as when the survey datum surface had to redefined; mean sea level is not a flat surface !)

Then again, I could imagine that when processing a continuous feed of digital data that the slowest stage is going to be critical, although I have not worked in a field where high speed data supply occurs (radio astronomy?)

It would be good to see some real performance measures and understand the relative performance requirements for sorting, reading and processing.

John

(link to test results shortly)
Sorry for delay in posting.
The linked .zip file includes the sort test .f90 code, batch file to run, .xls file of imported results and a .doc file which has charts and some description of results.
To comment on mecej4's comments below:
The latest tests includes two types of insertion sorts; binary search and reverse linear search which is good for very ordered data. Neither are good for random data.
The quick sort coded includes a bubble or a linear insert sort when sort packet size < 18. This improves speed by about 10% for the small increase in complexity.
I hope these results give some context to the comments.

https://www.dropbox.com/s/oae3tcpq29tdic4/sort_test2.zip?dl=0


Last edited by JohnCampbell on Sun Apr 28, 2019 11:48 am; edited 1 time in total
Back to top
View user's profile Send private message
silicondale



Joined: 15 Mar 2007
Posts: 243
Location: Matlock, Derbyshire, UK

PostPosted: Wed Apr 17, 2019 8:51 am    Post subject: Reply with quote

Hi John -
We are finding that by far the slowest part of our processing is the data acquisition - transfer from the robot or from a central server. We are now using USB3.1 and a 2-terabyte SSD which is a lot faster than the external hard drives that are our main archival storage medium, and using a hard-wired ethernet link to the submersible instead of the wifi we had used earlier. Data volumes per bag-file are typically in the tens of gigabytes. The next slowest part of the operation is extracting data from the ROS bag files. Rather than relying on the robot engineers to do this for us, we now use VirtualBox and our own python scripts to extract CSV files and PNG images. It's only when we get to see the data in an intelligible format that the Fortran processing comes into play - and is relatively fast. Following your comments I might well replace our shell-sort by quick-sort: will do some time tests, but it's already apparent that this is far from the slowest part of the process.

The Fortran step includes import of pixel-by-pixel data from multispectral images. I then pass the data on to our 3d modelling whiz-kid who projects the multispectral data on to the point cloud (he does everything in C++ of course because they don't teach Fortran in universities any more). We haven't yet got to do that stage though, because of a number of practical (engineering) problems in getting a reliable point cloud.
_________________
(Steve Henley)
Back to top
View user's profile Send private message Visit poster's website
mecej4



Joined: 31 Oct 2006
Posts: 1885

PostPosted: Mon Apr 22, 2019 12:10 pm    Post subject: Re: Reply with quote

JohnCampbell wrote:
I have done some testing of 85 million values on 2.8ghz i5.

If the numbers are only slightly out of sequence:
do i = 1,n
call random_number(real_array(i))
real_array(i) = real_array(i) + real(i)*0.125
end do
This results in quick sort of 4 seconds, while bubble sort of 2 seconds. (shell takes 4 seconds also)

For such inputs with mild disorder, if the data set consists of a sequence of blocks, with disorder only within blocks but such that every element of block k +1 is larger than every element of block k, insertion sort is a good choice. Oddly, you did not include this algorithm in your repertoire in the test program. Try the following program, please.
Code:

program tinssrt
! test insertion sort
implicit none
integer :: i, N, ktr
double precision, allocatable :: A(:)
integer, allocatable :: idx(:)
real :: t1,t2
!
N=10000000               ! start with N = 10 million
do ktr = 1,4
   allocate(A(N),idx(N))
   call random_number(A)
   idx=(/ (i,i=1,N) /)
   A=A+idx*0.125d0       ! (random real between 0 and 1) + (1:N)*0.125
   if (N <= 100) print '(" Unsorted:",/,(10F8.4))',A
   call cpu_time(t1)
   call inssort(N,idx,A)
   call cpu_time(t2)
   if (N <= 100) then
      print '(" Sorted:  ",/,(10F8.4))',A
      print '(20I4)',idx
   end if
   deallocate(A,idx)
   print 10,N,t2-t1
10 format(1x,i10,2x,F7.3)
   N = N*2                ! double N
end do   
end program

subroutine inssort(N,idx,A)
implicit none
integer, intent(in) :: N
integer, intent(in out) :: idx(N)
double precision, intent(in) :: A(N)
integer :: i,j,itmp
!
do i = 2, N
   j = i-1
   do while (j > 0)
      if (A(idx(j)) < A(idx(j+1))) exit
      itmp = idx(j+1)
      idx(j+1) = idx(j)
      idx(j)   = itmp
      j = j - 1
   end do
end do
return
end subroutine inssort

On a laptop with a i7-2720QM CPU, this program took about 1 second for N = 80 million.

Quote:
If the values are random ( call random_number(real_array) ) my quick sort takes about 50 seconds. My bubble and shell sorts are still running after 2 hours 50 minutes !!

Many algorithms (Quicksort, Mergesort, etc.) perform better when they are augmented by incorporating a switch to insertion sort for small partitions.

The sort routines of the standard libraries of C++, Java, C# and Python use such hybrid algorithms. For example, Google Search with "standard sort algorithm in .net".
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> Support All times are GMT + 1 Hour
Goto page Previous  1, 2
Page 2 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group