Topic: Database or NetCDF / HDF5 interfaces? in General

JohnCampbell

Posts: 2526 Sydney

Back to Top

16 Nov 2016 4:46 #18395

Dan,

Try subroutine file_size8@ (character*():File_name, real8:size, integer*2:error_code)

I have now tested both internal read and write, based on mecej4's approach. The performance times are:

fmtread
ftn95 /opt /link :  1.126 seconds
ftn95 /64 /link  :  0.698 seconds
gfortran /o2     :  6.238 seconds ##

intlread
ftn95 /opt /link :  0.229 seconds
ftn95 /64 /link  :  0.410 seconds
gfortran /o2     :  0.260 seconds

fmtwrite
ftn95 /opt /link :  3.120 seconds
ftn95 /64 /link  :  9.457 seconds  ##
gfortran /o2     : 40.322 seconds  ##

intlwrite
ftn95 /opt /link :  1.513 seconds
ftn95 /64 /link  :  1.803 seconds
gfortran /o2     :  0.286 seconds

These tests show that formatted write with ftn95 /64 is 3x slower than 32 bit and also 5x slower than a user written function. It would be good if this could be reviewed.

gFortran 64bit formatted write is worse, being 140x slower than a user written function, with little chance of a review there. the write test is: program IntlWrite implicit none character(len=100) :: str integer :: i,j,n real, dimension(10) :: x real del_sec, sec external del_sec ! ! Initialise vector sec = del_sec (0) do i = 1,10 x(i) = 3.0i-7.679 end do write (,'(10F10.3)')(-7.679+3i,i=1,10) ! ! Formatted write sec = del_sec (0) do j=1,1000000 x(3) = 3.0/j-7.679 write (str,'(10F10.3)') x if (mod(j,100000).eq.0) write (,) del_sec (-1), j, x(3) end do sec = del_sec (0) write (,) sec,' seconds : format write' ! x(4) = -.00025 x(5) = 0 x(6) = .00065 ! do j=1,1000000 x(3) = (3.0/j-.07679) do n=1,10 call write_val_r4 ( x(n), str(n10-9:n10), 3 ) end do if (mod(j,100000).eq.0) write(,) del_sec (-1), j, x(3) end do sec = del_sec (0) write (,) sec,' seconds : function write' write (,*) str ! end program

   real*4 function del_sec (update)
!
      integer*4 :: update
      integer*8 :: last_tick = 0
      integer*8 :: tick, rate
      real*4    :: dt
!
      call system_clock ( tick, rate )
      dt = real(tick-last_tick) / real(rate)      
      if ( update >= 0 ) last_tick = tick
      del_sec   = dt
   end function del_sec

JohnCampbell

Posts: 2526 Sydney

Back to Top

16 Nov 2016 4:49 #18396

My adaptation of a write subroutine is subroutine write_val_r4 (val, str, n) ! ! writes -3.04 ! real4 :: val ! value to write; must fit integer4 :: n ! digits >= 0 and < len(str) character :: str*() ! real4 :: rv ! abs ( val) integer8 :: v ! integer for digits of val integer8 :: ten = 10 ! mod integer4 :: k ! position of digit integer4 :: p ! position of '.' integer4 :: sgn ! +/- integer4 :: d ! digit integer*4 :: z = ichar ('0') ! k = len (str) p = k-n if ( p < 1 ) goto 99 str = ' ' ! if ( val > 0 ) then sgn = 1 rv = val else if ( val < 0 ) then sgn = -1 rv = -val else str(p-1:p) = '0.' return end if ! ! Integer of digits if (n > 0 ) then v = ( rv * 10**n + 0.5 ) else v = ( rv + 0.5 ) end if ! ! generate digits str(p:p) = '.'
do if ( k==p ) k = k-1 d = mod(v,ten) if ( k < 1 ) goto 99 str(k:k) = char (d+z) v = v/10 k = k-1 if ( v == 0 .and. k < p ) exit end do ! ! -ve values if ( sgn < 0 ) then if ( k < 1 ) goto 99 str(k:k) = '-' end if
return ! ! overflow field 99 str = repeat ('#', len(str)) return

   end subroutine write_val_r4

mecej4

Posts: 1911

Back to Top

16 Nov 2016 4:57 #18397

Quoted from DanRRight Mecej4, i just noticed that your simple code example for internal read works only for real*4 (not real8) F formatted numbers and also does not convert E formatted numbers. If you add what is missing will it be faster then standard formatted read like READ(11,'(10e10.3)') X ?

That was intentional. The more general the READ format, the more processing will be required. If you add what is missing, you will probably have duplicated what is already in the Fortran I/O library functions, and the run time will be longer.

Also, anyone knows FTN95 library or WinAPI function to find the file size without opening it or deciphering DIR command prompt call?

Other Fortran compilers support INQUIRE(FILE=filename, SIZE=file_size), but FTN95 does not yet do so, and provides a non-standard subroutine FILE_SIZE@() for this purpose.

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

16 Nov 2016 5:17 #18398

Thanks John and Mecej4.

I'd keep my eyes on making user format conversion subroutines more general, to allow real*8 F and E formats. Hopefully this will be done in processor L1 cache and will not add substantial processing time to way slower reading/writing process. May be for F and E/D formats separately if this generalization will harm the speed.

Interesting is also: how about speed of unformatted read/write ?

By the way i tried FTN95 /64 and got speed on John's original test almost 500 MB/second on write text and almost 200 MB/second on read numbers both on SSD. Still did not try to run test on RAMdrive because this requires reboot. Read text though gave me ...0.8 mb/sec (!!!) Yes, a megabyte per second. Some bug probably

JohnCampbell

Posts: 2526 Sydney

Back to Top

16 Nov 2016 9:43 #18399

I have adapted the F format routine and now have an ES format routine. The test results are:

ftn95 /opt /link
 F format   : 3.038 sec  F routine  : 1.758 sec
 ES format  : 3.116 sec  ES routine : 2.822 sec

ftn95 /64 /link
 F format   : 9.779 sec  F routine  : 1.456 sec
 ES format  : 9.166 sec  ES routine : 2.251 sec

gFortran -O2
 F format   : 39.97 sec  F routine  : 0.283 sec
 ES format  : 70.50 sec  ES routine : 0.973 sec

These results show that the /64 F and ES formats are very slow. Is it possible to review this performance.

(The gFortran are extremely poor / unacceptably slow !! These are for internal writes ?)

The additional routines are: program IntlWrite implicit none character(len=100) :: str integer :: i,j,n real, dimension(10) :: x real del_sec, sec external del_sec ! ! Initialise vector sec = del_sec (0) do i = 1,10 x(i) = 3.0i-7.679 end do write (,'(10F10.3)')(-7.679+3i,i=1,10) ! ! Formatted write sec = del_sec (0) do j=1,1000000 x(3) = 3.0/j-.07679 write (str,'(10F10.3)') x if (mod(j,100000).eq.0) write (,) del_sec (-1), j, x(3) end do sec = del_sec (0) write (,) sec,' seconds : format F write' write (,) str ! ! new values to test ES x(4) = -.00025 x(5) = 0 x(6) = .00065 ! ! function write do j=1,1000000 x(3) = 3.0/j-.07679 do n=1,10 call write_val_r4 ( x(n), str(n10-9:n10), 3 ) end do if (mod(j,100000).eq.0) write(,) del_sec (-1), j, x(3) end do sec = del_sec (0) write (,) sec,' seconds : function F write' write (,) str ! ! Formatted write do j=1,1000000 x(3) = 3.0/j-.07679 write (str,'(10ES10.3)') x if (mod(j,100000).eq.0) write (,) del_sec (-1), j, x(3) end do sec = del_sec (0) write (,) sec,' seconds : format ES write' write (,) str ! do j=1,1000000 x(3) = 3.0/j-.07679 do n=1,10 call write_val_e4 ( x(n), str(n10-9:n10), 3 ) end do if (mod(j,100000).eq.0) write(,) del_sec (-1), j, x(3) end do sec = del_sec (0) write (,) sec,' seconds : function ES write' write (,*) str ! end program

JohnCampbell

Posts: 2526 Sydney

Back to Top

16 Nov 2016 9:46 #18400

ctd subroutine write_val_e4 (val, str, n) ! ! writes -3.04E+01 ! real4 :: val ! value to write; must fit integer4 :: n ! digits >= 0 and < len(str) character :: str*() ! integer4 :: l ! len ( str ) real8 :: rv ! abs ( val ) real8 :: power ! log10 ( val ) real8 :: round ! round-off integer4 :: ip ! E+ip integer8 :: v ! integer for digits of val integer8 :: ten = 10 ! mod integer4 :: k ! position of digit integer4 :: p ! position of '.' integer4 :: sgn ! +/- integer4 :: d ! digit integer4 :: z = ichar ('0') integer4 :: i,m ! ! Remove sign if ( val > 0 ) then sgn = 1 rv = val else if ( val < 0 ) then sgn = -1 rv = -val else str = ' 0.' return end if ! ! Check for overflow m = max (n,1) ! digits to provide l = len (str) p = 3 if ( m > l-7 ) goto 99 str = ' ' ! ! Determine power power = log10 (rv) ; ip = power ; if ( power < 0) ip = (power - .9999999d0) round = 0.510.0d0**(ip-m) rv = rv + round !zz write (,) 'rv changed from',abs(val),' to',rv, ' with',round, ip ! check for 9.99999 + .00005
power = log10 (rv) ; ip = power ; if ( power < 0) ip = (power - .9999999d0) rv = rv * 10.0d0**(m-ip) ! ! generate digits str(p:p) = '.' k = 3+m ! last digit position v = rv ! digits !zz write (,*) 'working with',v,ip do if ( k==p ) k = k-1 d = mod(v,ten) if ( k < 1 ) goto 99 str(k:k) = char (d+z) v = v/10 k = k-1 if ( v == 0 .and. k < p ) exit end do ! ! -ve values if ( sgn < 0 ) then if ( k < 1 ) goto 99 str(k:k) = '-' end if
! ! write power k = 3+m+2 ! if ( ip < 0 ) then v = abs(ip) str(k-1:k) = 'E-' else v = ip str(k-1:k) = 'E+' end if ! m = 2 if ( v > 99 ) m = 3 k = k+m do i = 1,m d = mod(v,ten) str(k:k) = char (d+z) v = v/10 k = k-1 end do return ! ! overflow field 99 str = repeat ('#', l) return

   end subroutine write_val_e4


{batch file to test}
del %1.exe
ftn95 %1 /opt /link
%1

del %1.exe
ftn95 %1 /64 /link
%1

del %1.exe
gfortran %1.f90 -O2 -o %1.exe
%1

Paul,

Is it possible to review the FTN95 /64 performance for formatted write ?

John

mecej4

Posts: 1911

Back to Top

16 Nov 2016 10:45 #18401

A round of applause to John for writing the last chapter of the story and for writing the ES output routines.

As it has evolved over a week, the content of this thread is now showing some sprawl. It covers two sets of tests, both of which make a case for reviewing the I/O routines of FTN95 and Gfortran on Windows. The older tests required resources that may not be readily available to everyone (over 100 GB of HDD/SSD space) and substantial run times. The new tests are much more accessible to anyone, and make the case more eloquently.

Some cleaning up, collecting the new tests (READ and WRITE, using built-in formatting/custom routines) into a compact Zip file posted at, say, Dropbox, would give Paul something easier to work with. I offer to run the cleaned up tests (John, please avoid using any FTN95-specific routines in the test codes and make up a Zip file of the source files) with Intel Fortran and Gfortran on Windows and Linux on an older dual boot desktop.

JohnCampbell

Posts: 2526 Sydney

Back to Top

16 Nov 2016 11:40 #18402

mecej4,

I shall assemble the tests and provide a dropbox link.

There are actually 3 sets of tests that I have been demonstrating.

Binary_IO test that demonstrated the performance being achieved with fixed length record random access files. This test uses 3 libraries; 2 are based on Fortran fixed length record random access files, while the 3rd used FTN95's new long 64-bit address file routines that I have used for the first time. This code is non-standard FTN95 code, although BinLib was an attempt at conforming Fortran. I shall provide this library. These show the benefit of both SSD drives but probably more significantly the effect of memory cache buffering. I typically use these libraries to write then read information, so cacheing is important. Transfer rates in excess of 1 gb/sec were being demonstrated even using cached HDD, which is fairly good, especially for 32-bit solutions.
Text I/O with large files. This test tries to test file I/O which is non-buffered, for files up to 18 gb in size and 130 gb in total. This is what might be experienced when reading large .csv data sets supplied from an external source. This test is closest to Dan's identified problem. Key outcome was that reading was fairly fast but write numbers is slow. This is portable code, except for IOMSG=message, which is not supported by FTN95.
Internal buffer read/write. This test identifies the slow write number and read number speeds for FTN95 /64 and also gFortran, both of which need improving. FTN95 /32 has relatively good performance. This test has not combined with file text string I/O, but hopefully test 2 confirms this is not a problem. Hopefully this is portable code.

In terms of identifying a problem that needs improving; test 3) is most useful. I shall prepare a link tomorrow (my time)

I am interested to see what performance can be achieved with other compilers. The other tests would be interesting, although they are dependent on many variables, such as disk type, installed memory, processor and I suspect operating system.

John

mecej4

Posts: 1911

Back to Top

16 Nov 2016 1:26 (Edited: 16 Nov 2016 1:47) #18403

I ran your WRITE tests using gFortran, and these are some comments related to that.

Your formatting routines write_val_r4() and write_val_e4() use INTEGER8 variables. (write_val_e4() is not used in the test runs, but involves more use of INTEGER8.) As a result, 32-bit EXEs produced by gFortran run significantly slower. Here is a comparison (on my laptop, i7-2720, W10).

lgf  7.7 (32)   52.3     0.806    -o3 -32 -wpo -sse
lgf  7.7 (64)   46.6     0.255    -o3 -64 -wpo -sse

Lahey/GNU/Marlette Fortran is a repackaging of gFortran 5.4.1. I have Cygwin versions of gFortran that are older, and they give similar results.

The implication is (i) see if you can reduce the use of INTEGER*8 variables in the formatting subroutines, and (ii) please clearly state whether your gFortran results are for 32-bit or 64-bit architecture.

I am pondering whether I should sound out an optimization expert whom I happen to know about whether he could look at this performance bug. He has written often about building GCC/GFortran on both Windows and Linux.

Another possibility is to post just some results to C.L.F. with a link to the source codes, and see if someone from the GFortran developers group gets interested.

First, however, we need distributable test codes and some Linux results.

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

16 Nov 2016 1:44 #18404

John, So you have increased slow WRITE speed of gFortran by 100+ times and even made it 5x faster then FTN95, wow! Please keep going with READ, including arbitrary F format length. The numbers I read are made by C guys, and may look like this with up to 20 digits long and may even not place decimal point like here

0.000777203 -0.000032224 -0.000039659 -0.365450860 -1.011079630 -0.520287431 -0.285002098 1355.429879314 1

which is absurd but this is usual life of C programmers.

Mecej4, if it will take you not more then few minutes can you please modify your real*4 code to read numbers like this 1355.429879314 ignoring extra digits? My brain can not multitask right now 😃

And see this damn integer at the end of line in previous paragraph? I will try to handle it myself

mecej4

Posts: 1911

Back to Top

16 Nov 2016 4:17 #18406

It would be easy to modify the code to read the string that you showed. The field widths are not uniform, perhaps not known in advance, and may even vary from record to record. All you have to do is to scan the string for white space, and build an index of where each number begins and ends, and a count of the number of fields.

What you must do before writing code to do this, however, is this: write down (or obtain from your C programmers) a specification of what is allowed (and, therefore, may be expected) in all the text lines that your program will have to read. For example: will some input lines have more or less than the nine fields in your sample record? Will some lines have comments following the numbers? Will some of the numbers have exponents specified?

Without such a specification, you will have a program that will work fine on this specific input line, but may malfunction (without crashing or any sign of error) with other input lines that are superficially similar to this one. And, as someone famous said, 'premature optimization is the root of all evil'.

Another simple possibility is to have the C program that writes the strings use a repeated %17.9e or some such format, instead of the %.9f format that it is now using.

JohnCampbell

Posts: 2526 Sydney

Back to Top

16 Nov 2016 9:51 #18408

Quoted from mecej4 The implication is (i) see if you can reduce the use of INTEGER*8 variables in the formatting subroutines, and (ii) please clearly state whether your gFortran results are for 32-bit or 64-bit architecture.

My aim was to quickly produce a safe code, so I did not finesse trying to use I4 for the digits. You are a bit harsh in your comment, as the 32 bit version of my F routine has an increase of 0.551 seconds, while the gFortran F format increases by 5.7 seconds. Given that F and ES formats have been in use for REAL8 before INTEGER8 was commonly available, there must be a way, although I4 only supports 9 digits. Perhaps using I*8 would be an easy fix for FTN95 and gFortran F and ES formats.

The gFortran I am now using is 64-bit Ver 6.1.0. I download it pre-built. Reliable pre-built windows versions are difficult to find, which is a big worry. This keeps FTN95 for my main production code.

As you indicate, Dan's example is easy to read with the existing read_val, after identifying the number fields. There is an easy change required, which is to account for numbers without a '.'. Field delimiters can be a space or comma and could easily be extended to include other possibilities such as <HT> or ';:~|', which I have seen from some recording devices. I was surprised that ifort does not support ',' as a numeric field terminator for F or I formats, which FTN95 always has and gFortran appear to support. ( flexible number reading is probably too big a topic to bring into this thread !! )

John

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

16 Nov 2016 11:24 (Edited: 17 Nov 2016 12:45) #18409

Mecej4, Using E format will be slower almost twice according to John's test, so I'd use F with spacing delimiter for a while exactly like in your example. Your current text is almost OK, it already tracks for spaces, the only when iit hiccups is when numbers are too long like in my example above. I don't have time to dig deeper into your code, it is harder to change not own codes, but unfortunately don't have really time for writing my own and experimenting currently, I only barely can check what you guys publish here to see in which direction thread moves.

As an another mentioned direction to investigate, do you know if there exist G-like format in C which keeps the size and amount of digits constant, just moving floating point? I really don't need more then 7-8 leading digits but in future the numbers may grow beyond billion and the code must handle that by switching either to longer F numbers with real8 or use E format. But that's for later exercises. Currently used by my C guys F format is an absurd, it outputs numbers like this.

0.000000458 54321567.846764678

3 useful digits is wsy not enough, 17 is way to many. If Fortran using scientists or engineers would do like that they must be immediately fired without possibility of return. The only excuse - speed.

John, Intel Fortran is not allowing comma as a delimiter? You've made my day. Poor Intel.

mecej4

Posts: 1911

Back to Top

17 Nov 2016 12:08 #18410

Dan, I recommend that you ask that the C code that produces the data file use 'e' format. There is no problem with printing large or small numbers with that format, as long as the numbers lie between -10^{37 and +10}37, and processing lines in your Fortran code will be faster if the field width is uniform. Reading numbers with E format does not have to involve calls to log() -- you do not need to compute the log to 8 or 15 decimal digits when all you need is the integer part of the logarithm.

C also has 'g' format, but that format should only be used when the output is printed on paper to be read by humans.

Next, note that it should not matter that in our toy programs it takes twice as long to process input data with E format compared to F format. That is not a reason to exclude using E format. Your adult program probably does an obscene amount of number crunching; a typical run may be many minutes long, in which case an extra half second spent reading E-format data is negligible.

Finally, I do not understand why you use text files instead of binary files, unformatted Fortran files, or even a memory buffer, for exchanging data between C and Fortran when the file sizes are so large that no human is going to print and read those files.

JohnCampbell

Posts: 2526 Sydney

Back to Top

17 Nov 2016 12:28 #18411

Paul,

I have been looking at Dan's latest free format input and have got side-tracked with a /64 problem. A simple DO loop with a character*1 C temporary varianle is slowing things down dramatically.

If I replace lines 93:95 with lines 96:97 the loop speeds up dramatically from 22.7 seconds to 0.4 seconds, ie replacing do k = f,n c = str(k:k) if ( c == ' ' ) exit if ( c == ',' ) exit with do k = f,n if ( str(k:k) == ' ' ) exit if ( str(k:k) == ',' ) exit

I compile in PLATO with FTN95 Ver 8.05.0 dated 17/06/2016 and salflibc64.dll is 5/11/2016

BUILDLOG is: TN95.EXE 'C:\temp\forum\format\paul_c.f90' /64 /NO_BANNER /VS7 /DELETE_OBJ_ON_ERROR /ERROR_NUMBERS /UNLIMITED_ERRORS /LINK

ftn95 paul_c.f90 /64 /link also demonstrates

  program IntlRead 

   implicit none 
   integer, parameter :: lines = 1000000
   integer, parameter :: step  = lines / 10
   character(len=180) :: str 
   integer :: j,n, ks, ke
   real    :: del_sec, sec
   external   del_sec
!
!  Initialise string
     sec = del_sec (0)
     str = '0.000777203, -0.000032224, -0.000039659, -0.365450860, -1.011079630, -0.520287431, -0.285002098, 1355.429879314, 1, 2'
! 
!  F function read
     write (*,*) ' '
     write (*,*) 'Test get_next_field'
     do j=1,lines 
        ke = -1
        do n=1,10
           ks = ke+2
           call get_next_field_fast ( str,ks,ke )
        end do 
        if (mod(j,step) == 0) write(*,*) del_sec (-1), j
     end do 
     sec = del_sec (0)
     write (*,*) sec,' seconds : get_next_field_fast'
!
!  F function read
     write (*,*) ' '
     write (*,*) 'Test get_next_field_slow'
     do j=1,lines 
        ke = -1
        do n=1,10
           ks = ke+2
           call get_next_field_slow ( str,ks,ke )
        end do 
        if (mod(j,step) == 0) write(*,*) del_sec (-1), j 
     end do 
     sec = del_sec (0)
     write (*,*) sec,' seconds : get_next_field_slow '
 end program 
 
   subroutine get_next_field_fast ( str,ks,ke )
     character str*(*)
     integer*4 :: ks   ! start of next field
     integer*4 :: ke   ! returned end of next field
!
     integer*4 n, k, f
!
!   determine if valid field
     n = len (str)
     if ( ks < 1 .or. ks > n ) then
       ke = -1
       return
     end if
!
!  find start of number ( ignore leading blanks
     do f = ks,n
       if ( str(f:f) /= ' ' ) exit
     end do
!
!  find end of number
     do k = f,n
       if ( str(k:k) == ' ' ) exit
       if ( str(k:k) == ',' ) exit
     end do
     ke = k-1    
   end subroutine get_next_field_fast

   subroutine get_next_field_slow ( str,ks,ke )
     character str*(*)
     integer*4 :: ks   ! start of next field
     integer*4 :: ke   ! returned end of next field
!
     integer*4 n, k, f
     character c*1
!
!   determine if valid field
     n = len (str)
     if ( ks < 1 .or. ks > n ) then
       ke = -1
       return
     end if
!
!  find start of number ( ignore leading blanks
     do f = ks,n
       if ( str(f:f) /= ' ' ) exit
     end do
!
!  find end of number
     do k = f,n
       c = str(k:k)
       if ( c == ' ' ) exit
       if ( c == ',' ) exit
!       if ( str(k:k) == ' ' ) exit
!       if ( str(k:k) == ',' ) exit
!gen       if ( index ( ' ,~:;', str(k:k) ) > 0 ) exit
     end do
     ke = k-1    
   end subroutine get_next_field_slow

JohnCampbell

Posts: 2526 Sydney

Back to Top

17 Nov 2016 12:29 #18412

ctd

   real*4 function del_sec (update)
!
      integer*4 :: update
      integer*8 :: last_tick = 0
      integer*8 :: tick, rate
      real*4    :: dt
!
      call system_clock ( tick, rate )
      dt = real(tick-last_tick) / real(rate)      
      if ( update >= 0 ) last_tick = tick
      del_sec   = dt
   end function del_sec

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

17 Nov 2016 12:31 (Edited: 17 Nov 2016 2:52) #18413

Mecej4, Well, really output is in better then that for potentially read speed (since decompression processing is parallelized plus smaller file size) HDF5 format, data got extracted from HDF5 as text just for my reading. I have not yet adopted HDF5 with Fortran. Text is used because there are still very rare bugs, C is unstable beast, and they also happen in output messing debugging like hell because it was initially hard to catch them. ASCII allowed us to catch these NaNs and other garbage but this still happening. I want you guys to check unformatted read too but hope that there still exists speedup potential for formatted read because currently we use just 1℅ of I/O bandwidth

mecej4

Posts: 1911

Back to Top

17 Nov 2016 2:42 #18414

Quoted from JohnCampbell A simple DO loop with a character*1 C temporary variable is slowing things down dramatically.

If I replace lines 93:95 with lines 96:97 the loop speeds up dramatically from 22.7 seconds to 0.4 seconds.

John, this is because FTN95 /64 translates

     if ( c == ',' ) exit

to an expensive function call with four arguments

call ccomp(c, ',', 1, 1)

where as the 32-bit compiler simply does

cmpb      C,=44

even without /opt. Just setting up the call, the stack frame in ccomp() and then returning would take about a dozen instructions in the 64-bit version. I do not know in which DLL the actual code of ccomp() is located, but I guess that it is a full-fledged string comparison function.

JohnCampbell

Posts: 2526 Sydney

Back to Top

17 Nov 2016 7:16 #18415

mecej4,

Why is it that FTN95 /64 does the change for 'if ( c == ' ' ) exit' but not for 'if ( str(k:k) == ' ' ) exit' ? The performance change in comparison to what else is being done is very dramatic.

Also you stated 'Reading numbers with E format does not have to involve calls to log() -- you do not need to compute the log to 8 or 15 decimal digits when all you need is the integer part of the logarithm.' I actually use LOG10 twice, which was a quick fix, so I'd like to know your recommended alternative.

John

mecej4

Posts: 1911

Back to Top

17 Nov 2016 8:12 #18417

Quoted from JohnCampbell Why is it that FTN95 /64 does the change for 'if ( c == ' ' ) exit' but not for 'if ( str(k:k) == ' ' ) exit' ?

I can only guess. Input source code has many places where an opportunity for optimization exists, and this is one. FTN95/64 is in its infancy, and makes no claims to optimization. I can understand the compiler creators' saying 'let's first get it working right, and later work on optimizing' -- I mean the compiled code, not the compiler itself.

Take the logical expression c == ','. The two expressions being compared are character expressions. In general, they may be of different lengths, so comparison may involve truncation, blank-padding, etc., -- whatever is needed to do what the standard specifies. In this specific case, it is easy to see that the length of the RHS is 1. To find the length of the LHS, however, the symbol table has to be looked up to see that it is also 1. Thus, unless the compiler realizes that both sides have lengths equal to 1, it is reasonable to call a general RTL routine that compares two strings of different lengths.

Similar considerations apply to strings of byte lengths 2, 4 and, for FTN95/64, 8. These sizes fit into a register, so entire registers can be compared instead of doing a byte-by-byte comparison.

Also you stated 'Reading numbers with E format does not have to involve calls to log() -- you do not need to compute the log to 8 or 15 decimal digits when all you need is the integer part of the logarithm.' I actually use LOG10 twice, which was a quick fix, so I'd like to know your recommended alternative.

Notice that in your code, after taking the log, you chop off the fractional part? So, simply count the decimal digits as you are scanning the input string segment for a decimal point, 'E' or end of segment. That count is the value of int(log10(x)). Similarly, for small numbers, you count the number of zeros between the decimal point and the first significant digit after it. If an exponent E[s]nnn is present, you scan that and extract the nnn, and use that to adjust the previously computed integer part of the log10.

Perhaps this exploration should not be carried too far. As we add code to handle more variations in the types of input strings and trap errors in the input data, we move closer and closer to writing a sscanf() routine. We are not embarked on a rewriting of the FTN95/64 I/O RTL, are we? We can be more useful by alerting the Silverfrost team to bugs and areas of poor performance.