|
forums.silverfrost.com Welcome to the Silverfrost forums
|
View previous topic :: View next topic |
Author |
Message |
DanRRight
Joined: 10 Mar 2008 Posts: 2816 Location: South Pole, Antarctica
|
Posted: Mon Feb 02, 2015 5:48 am Post subject: Lexical Sort |
|
|
Can anyone make this damn subroutine of unknown author work ? Spent periodically tons of time on this work of devilry over two decades, every time i write new code using this @@$% it does not work.
http://jblevins.org/mirror/amiller/sortchar.f90 |
|
Back to top |
|
|
davidb
Joined: 17 Jul 2009 Posts: 560 Location: UK
|
Posted: Mon Feb 02, 2015 8:44 am Post subject: |
|
|
I have been testing this and so far it seems to work.
Can you give a short example of a list of strings where it doesn't work?
There are a number of things in the subroutine that can be tidied up:
- There is no need for a copy of the input character array
- There is no need for a copy of the input index array
- Final copy to put array in correct order in sort can be done on one line, i.e. StringArray = StringArray(indexarray(:))
- If you make all subroutines and functions internal to sort there is no need to pass the character array around.
- CaseSensitive can be local to sort rather than global
- The private declarations won't be needed as these subroutines are automatically private (being contained in sort).
- Ideally would change quicksort to do replace the tail recursion by iteration.
You can simplify things to:
Code: |
subroutine sort (StringArray, IndexArray, CaseInsensitive)
character (len = *), dimension (:), intent (in out) :: StringArray
integer, dimension (:), intent (out) :: IndexArray
logical, intent (in), optional :: CaseInsensitive
! Local variables
integer :: low, high, k
logical :: CaseSensitive
if (present(CaseInsensitive)) then
CaseSensitive = .not. CaseInsensitive
else
CaseSensitive = .true.
end if
low = 1
high = size(StringArray)
indexarray = (/ (k, k = low, high) /)
call quicksort(low, high)
! Move elements into correct order
StringArray = StringArray(indexarray(:))
contains
recursive subroutine quicksort(low, high)
integer, intent (in) :: low, high
integer :: pivotlocation
if (low < high) then
call partition(low, high, pivotlocation)
call quicksort(low, pivotlocation - 1)
call quicksort(pivotlocation + 1, high)
end if
return
end subroutine quicksort
! ... etc
|
_________________ Programmer in: Fortran 77/95/2003/2008, C, C++ (& OpenMP), java, Python, Perl |
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1886
|
Posted: Mon Feb 02, 2015 4:19 pm Post subject: |
|
|
Here is a test program that I used to test the string sort routine that Dan referred to. It worked correctly on a 38 MB text file with 640,000 lines, 58 characters per line. Therefore, Dan, I think that you may have not interpreted the use of the index array correctly. After the subroutine returns, the original string array will have been sorted.
You can use the routine provided by FTN95, chsort@, instead, if you do not need a case-insensitive sort. Unlike the other lexical sort routine, chsort@ does not modify the input string array, and you must use the index array to access the strings in sort order. See the comments in the program to understand how to use chsort@ instead.
Whichever sort routine is used, the output files are identical and both runs took about one second. Chsort@ is a bit faster than the other routine.
Code: | program sortstr
use LexicalSort ! not needed to use chsort@
implicit none
integer, parameter :: LLEN = 60, MAXN = 700000
character(len=LLEN) :: lines(MAXN)
integer :: idx(MAXN)
integer :: inp = 11,out=12,nlin,i,j
!
open(unit=inp,file='xyz.txt',status='old')
nlin=0
do
if(nlin >= MAXN)stop 'MAXN not large enough'
read(inp,'(A)',end=100)lines(nlin+1)
nlin=nlin+1
end do
100 write(*,*)nlin,' lines read'
close(inp)
!call chsort@(idx,lines,nlin) ! FTN95 routine
call sort(lines(1:nlin),idx,.false.) ! from http://jblevins.org/mirror/amiller/sortchar.f90
open(unit=out,file='xyz.srt',status='replace')
write(out,'(A)')lines(1:nlin)
!write(out,'(A)')lines(idx(1:nlin)) ! use with chsort@
close(out)
end program |
|
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2816 Location: South Pole, Antarctica
|
Posted: Mon Feb 02, 2015 8:03 pm Post subject: |
|
|
Thanks Davidb, mecej4, will look in detail at your suggestions. To answer your questions i have to go back from my numerous modifications to the original text which did not work with FTN95 at all initially until allocation problems were fixed. Last time i checked recently it still complained with something in sub PARTITION (attempting to alter actual argument that is a constant, an expression, or INTENT (in) argument or DO variable) when in /check /full_debug mode.
Meantime please try to run the code i mentioned in the full_debug mode. Does it work? David, please post your test code, may be i made some mess in mine
mecej4: I tried your example which works with debug. heck knows why mine does not, but to understand this it will need me to extract it from the large code. Will do that later
The big complaint of these sorting routines is that they treat all empty lines as records and place them in front of final array
As to CHSORT@ the Silverfrost can easily add "upper/lover case sensitivity" key and call it CHSORT1@
Last edited by DanRRight on Mon Feb 02, 2015 9:32 pm; edited 6 times in total |
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1886
|
Posted: Mon Feb 02, 2015 8:32 pm Post subject: |
|
|
The example code that I posted above works fine with the unmodified sortchar.f90 from Bevins' page when compiled using /full_debug with the 7.1 compiler.
Code: |
s:\sortchar>ftn95 sortchar.f90 /full_debug
[FTN95/Win32 Ver. 7.10.0 Copyright (c) Silverfrost Ltd 1993-2014]
PROCESSING MODULE [<LEXICALSORT> FTN95/Win32 v7.10.0]
NO ERRORS [<SORT> FTN95/Win32 v7.10.0]
NO ERRORS [<QUICKSORT> FTN95/Win32 v7.10.0]
NO ERRORS [<PARTITION> FTN95/Win32 v7.10.0]
NO ERRORS [<SWAP> FTN95/Win32 v7.10.0]
NO ERRORS [<STRINGCOMP> FTN95/Win32 v7.10.0]
NO ERRORS [<UPPERCASE> FTN95/Win32 v7.10.0]
NO ERRORS [<LEXICALSORT> FTN95/Win32 v7.10.0]
s:\sortchar>ftn95 grdsortu.f90 /full_debug
[FTN95/Win32 Ver. 7.10.0 Copyright (c) Silverfrost Ltd 1993-2014]
NO ERRORS [<SORTSTR> FTN95/Win32 v7.10.0]
s:\sortchar>slink /debug grdsortu.obj sortchar.obj
Creating executable: s:\sortchar\grdsortu.exe
s:\sortchar>grdsortu
640352 lines read
|
When I used the /check /full_debug compiler options, I had to specify a larger stack size in the linking step, but this is because I have large array sizes in the program to process my large data set. The program runs about four times slower with these checking options, but there are no errors that I can see.
There are no instances of INTENT(IN) arguments being changed in any of the subroutines in the sort package. |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2816 Location: South Pole, Antarctica
|
Posted: Mon Feb 02, 2015 9:11 pm Post subject: |
|
|
mecej4: yes, this really surprising to me that it works on your example.
As to stack size i am flirting long ago with the stacks approaching 1GB, but so far this sorting sub is the only major fault. Sorting works in /nocheck mode
Both suggestions - davidb's and yours - were very good: to avoid copy of array (that stopped me on 5 Million lines character*128 arrays while sometimes 10M were needed) and using chsort@ which indeed is also usable but needs mentioned above modifications to be case insensitive and not to treat empty lines as records. |
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1886
|
Posted: Mon Feb 02, 2015 9:38 pm Post subject: |
|
|
Dan, w.r.t. doing case-insensitive sorts, consider the following. You can prune away "empty lines" from your input, make an all-uppercase copy of the strings, and call chsort@ with this copy. After chsort@ is done, discard the copy, and use the index array to output (or otherwise access in sorted order) the original array of mixed-case strings.
This is more memory intensive, but much faster, for the following reason. Quicksort takes a multiple of N lg N comparisons to sort N items. In the sortchar routine, the sorting is going to change string case twice that many times. The method that I suggested in the preceding paragraph, on the other hand, makes only N conversions to uppercase.
If you have 5M lines, then we are comparing 5 M with 220 M case conversions. Big difference! |
|
Back to top |
|
|
davidb
Joined: 17 Jul 2009 Posts: 560 Location: UK
|
Posted: Mon Feb 02, 2015 10:44 pm Post subject: |
|
|
With quicksort it is a good idea to switch to an insertion sort when the list is small. This can speed things up considerably. The following replacement code does this when sublists are 10 elements or smaller. You need to adjust this parameter to get the best performance.
It also eliminates recursion for one of the sublists (iteration is always used for the larger list). This guarantees n*log(n) on average.
Notice I have removed the array from the argument list. Put it back in if you don't want this to be contained inside sort.
Code: |
recursive subroutine quicksort(low, high)
integer, intent (in) :: low, high
integer :: pivotlocation
integer :: l, u, i, j, ixTemp
integer, parameter :: size_magic = 10
l = low
u = high
do while (u - l >= size_magic)
call partition(l, u, pivotlocation)
if (u - pivotlocation > pivotlocation - l) then
! Lower partition is smaller, sort using recursion
call quicksort(l, pivotlocation - 1)
! Sort upper partition using iteration
l = pivotlocation + 1
else
! Upper partition is smaller, sort using recursion
call quicksort(pivotlocation + 1, u)
! Sort lower partition using iteration
u = pivotlocation - 1
end if
end do
! Use an insertion sort for small subranges
do i=l+1, u
ixTemp = indexarray(i)
do j=i-1, l, -1
if (stringComp(StringArray(indexarray(j)), StringArray(ixTemp))) exit
indexarray(j+1) = indexarray(j)
end do
indexarray(j+1) = ixTemp
end do
return
end subroutine quicksort
|
_________________ Programmer in: Fortran 77/95/2003/2008, C, C++ (& OpenMP), java, Python, Perl |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2816 Location: South Pole, Antarctica
|
Posted: Tue Feb 03, 2015 8:42 am Post subject: |
|
|
davidb, mecej4, those were good suggestions too, i'd say, a pure hackery. I suggest to finalize your ideas for everyone in the real working subroutine adding small demo code (ideally with the comparisons with other routines). Initial data you can just randomly generate.
/* By the way, this Forum should have long ago one more section called Users' Codes for the library of short useful codes.
/**Clearwin demo codes with pictures would be particularly useful because when people hear for first time the word Clearwin they think "yeah, yeah, everyone says try this, try that... one more 134532th @#$% today..." |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|