forums.silverfrost.com Forum Index forums.silverfrost.com
Welcome to the Silverfrost forums
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Lexical Sort

 
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> General
View previous topic :: View next topic  
Author Message
DanRRight



Joined: 10 Mar 2008
Posts: 2815
Location: South Pole, Antarctica

PostPosted: Mon Feb 02, 2015 5:48 am    Post subject: Lexical Sort Reply with quote

Can anyone make this damn subroutine of unknown author work ? Spent periodically tons of time on this work of devilry over two decades, every time i write new code using this @@$% it does not work.

http://jblevins.org/mirror/amiller/sortchar.f90
Back to top
View user's profile Send private message
davidb



Joined: 17 Jul 2009
Posts: 560
Location: UK

PostPosted: Mon Feb 02, 2015 8:44 am    Post subject: Reply with quote

I have been testing this and so far it seems to work.

Can you give a short example of a list of strings where it doesn't work?

There are a number of things in the subroutine that can be tidied up:

- There is no need for a copy of the input character array
- There is no need for a copy of the input index array
- Final copy to put array in correct order in sort can be done on one line, i.e. StringArray = StringArray(indexarray(:))
- If you make all subroutines and functions internal to sort there is no need to pass the character array around.
- CaseSensitive can be local to sort rather than global
- The private declarations won't be needed as these subroutines are automatically private (being contained in sort).
- Ideally would change quicksort to do replace the tail recursion by iteration.

You can simplify things to:
Code:

   subroutine sort (StringArray, IndexArray, CaseInsensitive)

     character (len = *), dimension (:), intent (in out)  :: StringArray
     integer, dimension (:), intent (out)                 :: IndexArray
     logical, intent (in), optional                       :: CaseInsensitive

     ! Local variables
     integer  :: low, high, k
     logical  :: CaseSensitive
     if (present(CaseInsensitive)) then
       CaseSensitive = .not. CaseInsensitive
     else
       CaseSensitive = .true.
     end if

     low = 1
     high = size(StringArray)

     indexarray = (/ (k, k = low, high) /)

     call quicksort(low, high)

     ! Move elements into correct order
     StringArray = StringArray(indexarray(:))

   contains

      recursive subroutine quicksort(low, high)
        integer, intent (in)                                :: low, high

        integer :: pivotlocation
        if (low < high) then
          call partition(low, high, pivotlocation)
          call quicksort(low, pivotlocation - 1)
          call quicksort(pivotlocation + 1, high)
        end if
        return
      end subroutine quicksort

      ! ... etc

_________________
Programmer in: Fortran 77/95/2003/2008, C, C++ (& OpenMP), java, Python, Perl
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 1886

PostPosted: Mon Feb 02, 2015 4:19 pm    Post subject: Reply with quote

Here is a test program that I used to test the string sort routine that Dan referred to. It worked correctly on a 38 MB text file with 640,000 lines, 58 characters per line. Therefore, Dan, I think that you may have not interpreted the use of the index array correctly. After the subroutine returns, the original string array will have been sorted.

You can use the routine provided by FTN95, chsort@, instead, if you do not need a case-insensitive sort. Unlike the other lexical sort routine, chsort@ does not modify the input string array, and you must use the index array to access the strings in sort order. See the comments in the program to understand how to use chsort@ instead.

Whichever sort routine is used, the output files are identical and both runs took about one second. Chsort@ is a bit faster than the other routine.

Code:
program sortstr
use LexicalSort  ! not needed to use chsort@
implicit none
integer, parameter :: LLEN = 60, MAXN = 700000
character(len=LLEN) :: lines(MAXN)
integer :: idx(MAXN)
integer :: inp = 11,out=12,nlin,i,j
!
open(unit=inp,file='xyz.txt',status='old')
nlin=0
do
   if(nlin >= MAXN)stop 'MAXN not large enough'
   read(inp,'(A)',end=100)lines(nlin+1)
   nlin=nlin+1
end do
100 write(*,*)nlin,' lines read'
close(inp)
!call chsort@(idx,lines,nlin) ! FTN95 routine
call sort(lines(1:nlin),idx,.false.) ! from http://jblevins.org/mirror/amiller/sortchar.f90
open(unit=out,file='xyz.srt',status='replace')
write(out,'(A)')lines(1:nlin)
!write(out,'(A)')lines(idx(1:nlin)) ! use with chsort@
close(out)
end program
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2815
Location: South Pole, Antarctica

PostPosted: Mon Feb 02, 2015 8:03 pm    Post subject: Reply with quote

Thanks Davidb, mecej4, will look in detail at your suggestions. To answer your questions i have to go back from my numerous modifications to the original text which did not work with FTN95 at all initially until allocation problems were fixed. Last time i checked recently it still complained with something in sub PARTITION (attempting to alter actual argument that is a constant, an expression, or INTENT (in) argument or DO variable) when in /check /full_debug mode.

Meantime please try to run the code i mentioned in the full_debug mode. Does it work? David, please post your test code, may be i made some mess in mine

mecej4: I tried your example which works with debug. heck knows why mine does not, but to understand this it will need me to extract it from the large code. Will do that later

The big complaint of these sorting routines is that they treat all empty lines as records and place them in front of final array

As to CHSORT@ the Silverfrost can easily add "upper/lover case sensitivity" key and call it CHSORT1@


Last edited by DanRRight on Mon Feb 02, 2015 9:32 pm; edited 6 times in total
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 1886

PostPosted: Mon Feb 02, 2015 8:32 pm    Post subject: Reply with quote

The example code that I posted above works fine with the unmodified sortchar.f90 from Bevins' page when compiled using /full_debug with the 7.1 compiler.
Code:

s:\sortchar>ftn95 sortchar.f90 /full_debug
[FTN95/Win32 Ver. 7.10.0 Copyright (c) Silverfrost Ltd 1993-2014]
    PROCESSING MODULE  [<LEXICALSORT> FTN95/Win32 v7.10.0]
        NO ERRORS  [<SORT> FTN95/Win32 v7.10.0]
        NO ERRORS  [<QUICKSORT> FTN95/Win32 v7.10.0]
        NO ERRORS  [<PARTITION> FTN95/Win32 v7.10.0]
        NO ERRORS  [<SWAP> FTN95/Win32 v7.10.0]
        NO ERRORS  [<STRINGCOMP> FTN95/Win32 v7.10.0]
        NO ERRORS  [<UPPERCASE> FTN95/Win32 v7.10.0]
    NO ERRORS  [<LEXICALSORT> FTN95/Win32 v7.10.0]

s:\sortchar>ftn95 grdsortu.f90 /full_debug
[FTN95/Win32 Ver. 7.10.0 Copyright (c) Silverfrost Ltd 1993-2014]
    NO ERRORS  [<SORTSTR> FTN95/Win32 v7.10.0]

s:\sortchar>slink /debug grdsortu.obj sortchar.obj
Creating executable: s:\sortchar\grdsortu.exe

s:\sortchar>grdsortu
       640352 lines read

When I used the /check /full_debug compiler options, I had to specify a larger stack size in the linking step, but this is because I have large array sizes in the program to process my large data set. The program runs about four times slower with these checking options, but there are no errors that I can see.

There are no instances of INTENT(IN) arguments being changed in any of the subroutines in the sort package.
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2815
Location: South Pole, Antarctica

PostPosted: Mon Feb 02, 2015 9:11 pm    Post subject: Reply with quote

mecej4: yes, this really surprising to me that it works on your example.
As to stack size i am flirting long ago with the stacks approaching 1GB, but so far this sorting sub is the only major fault. Sorting works in /nocheck mode

Both suggestions - davidb's and yours - were very good: to avoid copy of array (that stopped me on 5 Million lines character*128 arrays while sometimes 10M were needed) and using chsort@ which indeed is also usable but needs mentioned above modifications to be case insensitive and not to treat empty lines as records.
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 1886

PostPosted: Mon Feb 02, 2015 9:38 pm    Post subject: Reply with quote

Dan, w.r.t. doing case-insensitive sorts, consider the following. You can prune away "empty lines" from your input, make an all-uppercase copy of the strings, and call chsort@ with this copy. After chsort@ is done, discard the copy, and use the index array to output (or otherwise access in sorted order) the original array of mixed-case strings.

This is more memory intensive, but much faster, for the following reason. Quicksort takes a multiple of N lg N comparisons to sort N items. In the sortchar routine, the sorting is going to change string case twice that many times. The method that I suggested in the preceding paragraph, on the other hand, makes only N conversions to uppercase.

If you have 5M lines, then we are comparing 5 M with 220 M case conversions. Big difference!
Back to top
View user's profile Send private message
davidb



Joined: 17 Jul 2009
Posts: 560
Location: UK

PostPosted: Mon Feb 02, 2015 10:44 pm    Post subject: Reply with quote

With quicksort it is a good idea to switch to an insertion sort when the list is small. This can speed things up considerably. The following replacement code does this when sublists are 10 elements or smaller. You need to adjust this parameter to get the best performance.

It also eliminates recursion for one of the sublists (iteration is always used for the larger list). This guarantees n*log(n) on average.

Notice I have removed the array from the argument list. Put it back in if you don't want this to be contained inside sort.

Code:

      recursive subroutine quicksort(low, high)
         integer, intent (in)                                :: low, high
         integer :: pivotlocation
         integer :: l, u, i, j, ixTemp
         integer, parameter :: size_magic = 10
         l = low
         u = high
         do while (u - l >= size_magic)
           call partition(l, u, pivotlocation)
           if (u - pivotlocation > pivotlocation - l) then
              ! Lower partition is smaller, sort using recursion
              call quicksort(l, pivotlocation - 1)
              ! Sort upper partition using iteration
              l = pivotlocation + 1
           else
              ! Upper partition is smaller, sort using recursion
              call quicksort(pivotlocation + 1, u)
              ! Sort lower partition using iteration
              u = pivotlocation - 1
           end if
         end do
       
         ! Use an insertion sort for small subranges
         do i=l+1, u
            ixTemp = indexarray(i)
            do j=i-1, l, -1
               if (stringComp(StringArray(indexarray(j)), StringArray(ixTemp))) exit
               indexarray(j+1) = indexarray(j)
            end do
            indexarray(j+1) = ixTemp
         end do
         return
      end subroutine quicksort

_________________
Programmer in: Fortran 77/95/2003/2008, C, C++ (& OpenMP), java, Python, Perl
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2815
Location: South Pole, Antarctica

PostPosted: Tue Feb 03, 2015 8:42 am    Post subject: Reply with quote

davidb, mecej4, those were good suggestions too, i'd say, a pure hackery. I suggest to finalize your ideas for everyone in the real working subroutine adding small demo code (ideally with the comparisons with other routines). Initial data you can just randomly generate.


/* By the way, this Forum should have long ago one more section called Users' Codes for the library of short useful codes.

/**Clearwin demo codes with pictures would be particularly useful because when people hear for first time the word Clearwin they think "yeah, yeah, everyone says try this, try that... one more 134532th @#$% today..."
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> General All times are GMT + 1 Hour
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group