forums.silverfrost.com Forum Index forums.silverfrost.com
Welcome to the Silverfrost forums
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Database or NetCDF / HDF5 interfaces?
Goto page Previous  1, 2, 3, 4, 5, 6  Next
 
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> General
View previous topic :: View next topic  
Author Message
DanRRight



Joined: 10 Mar 2008
Posts: 1433
Location: South Pole, Antarctica

PostPosted: Thu Nov 17, 2016 1:54 pm    Post subject: Reply with quote

Well, unformatted read is probably the only good solution until someone will break the sound barrier in speed with formatted read of text files.

i will probably meantime transform all the files which have no problems in them into binary and then read unformatted way. This of course will take time for conversion (will be done in batch regime or overnight) but this will be compensated by 10x speedup when loading them.
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 609

PostPosted: Thu Nov 17, 2016 2:30 pm    Post subject: Reply with quote

Do note that FTN95 uses a different convention for record markers for Fortran unformatted files than most other current Fortran compilers, and your C programmers may be unfamiliar with Fortran unformatted files. You can search for messages in the Silverfrost forums about conversion from the common unformatted file format to the Silverfrost format.

Not a big issue, but something to be aware of.
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 1433
Location: South Pole, Antarctica

PostPosted: Thu Nov 17, 2016 7:49 pm    Post subject: Reply with quote

Got so far 3.5x speed increase on unformatted read with real code versus 10-15x on simple test benches. All that versus my older formatted *-format specifier read. This is because besides loading there are a lot of processing with LOG, EXP and large arrays in my code go simultaneously...Binary files are 3x more compact by the way then my older text files

And tests show that when 64 will be finally ready the read speed will be additionally 1.8x faster

Can you guys please do independent check for unformatted read speed ? We have chance to break 1GB/second on RAMdrives
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 1706
Location: Sydney

PostPosted: Fri Nov 18, 2016 12:19 am    Post subject: Reply with quote

dan,

Quote:
We have chance to break 1GB/second on RAMdrives

I am getting these high rates when the file is cached, typically with a binary I/O test of write then read. If you are opening a file on disk for reading, which has not been previously loaded into a disk cache in available memory, these speeds may not be possible.
While binary files have the advantage of speed, text based files from external sources have the advantage of easy verification and adaptation of a free-format reader. My recent testing has shown these can be read at 60 to 100 mb/sec, which is not too bad.
The alternative is you could write a converter that reads the text file and produces a binary format for later quicker analysis, although this involves duplication of the information. The advantage of this is the converter can also be a data reviewer, that reports likely errors and simple statistics of range and extremes, which I find essential.
You haven't indicated the quantity of data to be analysed, say gb's or terra-bytes per day.
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 1433
Location: South Pole, Antarctica

PostPosted: Fri Nov 18, 2016 1:57 pm    Post subject: Reply with quote

John,

PCI-express SSDs have 3GB/s, we have to utilize this bandwidth

I am so sick and furious with slow formatted text read that i want to try anything. Unformatted read of binary data is of course very dangerous but hopefully we passed the learning curve when it was very hard to catch the error in huge data sets.

Right now we have to handle 10-100GB size files for one run. Coming soon 64-bit compiler will increase the loaded volumes by order of magnitude.


Last edited by DanRRight on Fri Nov 18, 2016 2:00 pm; edited 1 time in total
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 609

PostPosted: Fri Nov 18, 2016 2:00 pm    Post subject: Reply with quote

John,

Here is Fortran code for a fast but approximate log10 function. You may try replacing the two calls to log10() in your E format output routine by calls to qlg10(). It computes an 8-bit approximation to log10(x), which is slightly better than two decimal digits. In the first call, a small table is set up. Each subsequent evaluation of qlg10() involves a single floating-multiply-add (FMA) operation and some integer shift and AND operations.
Code:
real function qlg10(x)
!
! quick evaluation of log10(x) with 8 bit precision, for use in finding
! integer part of log10(x) to use with E and G format output processing
!
! See http://www.icsi.berkeley.edu/pubs/techreports/TR-07-002.pdf
!
implicit none
integer, parameter :: n = 8
integer, parameter :: tblsiz = 256, mbits = 23   ! 2^n, mantissa bits
real, intent(in) :: x
real, dimension(tblsiz),save :: tbl
logical :: beg = .true.
integer :: i, ix, lg2, y, mask
data mask/Z'7FFFFF'/
!
if(beg)then
   beg = .false.
   do i=1,tblsiz
        tbl(i) = LOG10((i-1)/real(tblsiz) + 1.0)
   end do
endif
ix=transfer(x,ix)
lg2=iand(ishft(ix,-mbits),255) - 127  ! exponent
y=ishft(iand(ix,mask),n-mbits)    ! 8-bit mantissa
qlg10=0.30103*lg2+tbl(y+1)
return
end function qlg10
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 609

PostPosted: Sun Nov 20, 2016 3:27 pm    Post subject: Improved timings Reply with quote

After some fine tuning of the E and F output formatting subroutines, I obtained the following timing results (W10, i7-2720QM).
Code:
               F-fmt  F-sub    E-fmt  E-sub

FTN95 32-bit   2.904  0.595    3.057  2.703
FTN95 64-bit   9.057  0.796    8.985  3.013

GFTN  6.2-32  50.940  0.359   85.160  2.578
GFTN  6.2-64  48.290  0.234   81.870  2.110

LGF   7.7-32  48.120  0.344   80.470  2.562
LGF   7.7-64  45.930  0.250   77.030  2.047

IFort 17-32    2.320  0.427    2.300  1.247
IFort 17-64    2.390  0.512    1.980  1.122


GFTN is Gfortran 6.2 from Equation.com. LGF is Lahey/GNU 7.7, which is a repackaging of Gfortran 5.4.1.


Last edited by mecej4 on Mon Nov 21, 2016 9:42 am; edited 3 times in total
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 1433
Location: South Pole, Antarctica

PostPosted: Sun Nov 20, 2016 4:12 pm    Post subject: Reply with quote

Thanks mecej4, such wide comparisons tell a lot.

Was this for WRITE? How about READ?

And for completeness would be good to add also * format which is very good when there are different type variables of different widths in the line
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 1706
Location: Sydney

PostPosted: Mon Nov 21, 2016 4:45 am    Post subject: Reply with quote

mecej4's summary of write performance gives a good indication of Fortran formatted write performance, especially,
FTN95 /32 is comparatively good for formatted write
FTN95 /64 is slow for Format Write
gFortran is incredibly slow for Format write.

The function approach is faster and may provide a use if there are specific changes to output, say for 0. or -0.000, or if using gFortran !

regarding READ performance, I have reviewed the internal read options, using a variation of mecej4's post on 15 Nov and Dan's free format layout of 16 Nov.

The test options I considered are:
1 read fixed format using 10F10.3
2 read fixed format using mecej4's read_val routine
3 read fixed format using read_val routine for optional decimal
4 read free format layout, using Dan's layout example, with numbers separated by a space or comma
5 parse string only, using Dan's layout example, with numbers separated by a space or comma

I have tested this for FTN95 /32, FTN95 /64 and gFortran 64-bit.

Code:
ftn95/32 ftn95/64 gfortran  Test
  1.102   0.665   6.331    1 Fortran FORMAT read
  0.233   0.405   0.284    2 function read - fixed field length
  0.272   0.499   0.328    3 function read - optional fraction value,  fixed field length
  0.622   1.033   0.755    4 function read - optional fraction value, variable field length
  0.211   0.410   0.485    5 get_next_field only


These results do not show the variation that the write test did show, although the parse string code (finding each number string) is surprisingly slow compared to the reading times.

My guess is that the test #4 approach does give greater flexibility reading number strings, but is about the same speed as FORMAT reads.
FTN95 and gFortran does offer comma support in the standard F format.
gFortran FORMAT read is again slow in comparison.

The advantage of the function approach for large data sets is you can change the valid number formats to suit the equipment that writes the file, eg tab or colon delimited fields, as well as space or comma can be supported.

The test program is on this link

https://www.dropbox.com/s/ut82pkdexc3si5h/intlRead.f90?dl=0

Neither of the functions listed support E format reading, although this would need to be included if required.

Based on this, and previous posts, I estimate that a read speed for external text files is of the order of 100 mb/sec for HDD and 300 mb/sec for SSD, which would be limited by the read conversion. This is faster than some of Dan's performance reports, which may be due to other disk type problems.

Unfortunately when reading large files, they are typically not buffered.

It has always been my preference to use a free format function approach, where I also provide other error and character checking reports. This is easiest using stream I/O. Checking and other field definitions will reduce read rates. At the end of reading, providing statistics of character counts etc is useful to give confidence about the reading process. eg counts of lines read, control characters, delimiters and fields per line always help.

John
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 1433
Location: South Pole, Antarctica

PostPosted: Mon Nov 21, 2016 8:56 am    Post subject: Reply with quote

Thanks, John, that comparison was great. You both with mecej4 have made very nice insight into the Fortran I/O, made huge step forward in speeding it and pointed out the further areas of improvement !
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 609

PostPosted: Mon Nov 21, 2016 9:25 am    Post subject: Reply with quote

John: There is one improvement that can be made to your code for writing E-formatted numbers; it may yield a slight but noticeable improvement in speed.

After one has found log10(x), rounded and scaled x, the integer v expressed in the decimal scale is exactly m digits long. Therefore, the two IF tests in the DO loop following "working with" are uncecessary. The loop can be replaced by
Code:
     do k=3+m,p+1,-1
        str(k:k) = char (mod(v,10)+z)
        v=v/10
     end do
     str(p-1:p-1) = char (v+z)

if you remember that str(p:p) contains the decimal point character.
..............
Have you thought out how to apply what we have learned to develop a work-around for your FEA code? If you can segregate a few of the most CPU-cycle-eating WRITE statements, you could replace them with calls to subroutines that generate formatted strings and print the strings with WRITE (nn, '(A)') str statements.
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 1706
Location: Sydney

PostPosted: Mon Nov 21, 2016 1:44 pm    Post subject: Reply with quote

mecej4,

Thanks for the further advice. I have modified the loops to remove some of the unnecessary instructions you indicated and have produced some significant improvements. The following table tracks the elapsed time performance of the ES Function for changes that have been discussed, including:
removing a LOG10 function then replacing the other with a quick LOG10 look-up function
cleaning the DO loops to remove unnecessary instructions
Code:
ES Function run time (seconds)
id FTN95   FTN95/64 gFortran  Test
 1 2.862   2.353   0.939      Initial ES Function
 2 2.520   2.320   0.684      remove 1 x LOG10
 3 2.417   2.042   0.640      restructure ** power
 5                 0.424      replace log10 with qlg10
 6 2.278   2.353   0.423      use fortran Qlog10
 7 1.886   2.062   0.302      tidy ES function


FTN95 /32 and gFortran have produced anticipated performance improvements, while FTN95 /64 has shown some variability in gains.

The final function is:
Code:
   subroutine write_val_e4 (val, str, n)
!
!  writes -3.04E+01
!
     real*4    :: val         !  value to write; must fit
     integer*4 :: n           !  digits >= 0 and < len(str)
     character :: str*(*)
!
     integer*4 :: l           ! len ( str )
     real*4    :: rv          ! abs ( val )
     real*4    :: power       ! log10 ( val )
     integer*4 :: ip          ! E+ip
     integer*8 :: v           ! integer for digits of val
     integer*8 :: ten = 10    ! mod
     integer*4 :: k           ! position of digit
!     integer*4 :: p = 3       ! position of '.'
     integer*4 :: sgn         ! +/-
     integer*4 :: d           ! digit
     integer*4 :: z = ichar ('0')
     integer*4 :: i,m
!
     REAL     QLG10
     external QLG10
!
!  Remove sign
     if ( val > 0 ) then
       sgn = 1
       rv  = val
     else if ( val < 0 ) then
       sgn = -1
       rv  = -val
     else
       str = ' 0.'
       return
     end if
!
!  Check for overflow
     m   = max (n,1)                !  digits to provide
     l   = len (str)
     if ( m > l-7 ) goto 99
!
!  Determine power
     power = QLG10 (rv)  ; ip = power  ; if ( power < 0) ip = (power - .9999999d0)
     rv    = rv * 10.0**(m-ip) + 0.5
!
!  generate digits
!     k   = 3+m                      ! last digit position
     v   = rv                       ! digits
     str = ' '
     do k = 3+m,3+1,-1
       d = mod(v,ten)
       str(k:k) = char (d+z)
       v = v/10
     end do
!
     str(3:3) = '.'
     if ( v > 9 ) then
!  patch for round up power
       write (*,*) 'Power increase identified :', str(1:3+m), ip
       str(2:2) = '1'
       ip = ip+1
     else
       d = v
       str(2:2) = char (d+z)
     end if
!
!  -ve values
     if ( sgn < 0 ) str(1:1) = '-'
!
!  write power
     k = 3+m+2           !
     if ( ip < 0 ) then
       v = abs(ip)
       str(k-1:k) = 'E-'
     else
       v = ip
       str(k-1:k) = 'E+'
     end if
!
     m = 2
     if ( v > 99 ) m = 3
     do i = k+m,k+1,-1
       d = mod(v,ten)
       str(i:i) = char (d+z)
       v = v/10
     end do
     return
!
!  overflow field
 99  str = repeat ('#', l)
     return

   end subroutine write_val_e4


It will be good to test the updated FTN95 /64 /OPT on this and my earlier post on Thu Nov 17, 2016, when it is available.
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 609

PostPosted: Mon Nov 21, 2016 6:58 pm    Post subject: Reply with quote

A couple of observations.

1. In many borderline cases, your new routine is faithful to the "round to nearest or even" rule; the previous version does not always get the least significant digit right.

2. When the input number causes the "patch for round up power" to be applied, the result is not correct. Try, for example,
Code:
val = 1.000857E-1
call write_val_e4 (val,str,3)
print *,str

The printed value is 1.009E-01, which is missing the third '0' preceding the '9'.[/code]


Last edited by mecej4 on Tue Nov 22, 2016 12:48 am; edited 1 time in total
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 1706
Location: Sydney

PostPosted: Mon Nov 21, 2016 11:00 pm    Post subject: Reply with quote

mecej4,

How do you find these numbers !

The error is due to the error with QLG10 estimating the wrong integer power. ( I was hoping the lookup table would have been correct at multiples of 10 )
I corrected the problem by making a more general response to the integer power "ip" being wrong. (could have an infinite loop with rounding ?)

John

Code:
   program IntlWrite
     call test ( 99.999 )
     call test ( 1.000857e-1 )
     call test ( 1.00857e-1 )
   end

   subroutine test ( x )
     implicit none
     character(len=100) :: str
     real   :: x
     REAL     QLG10
     external QLG10
!
      write (*,*) 'log', qlg10 (x), log10 (x), x
      call write_val_e4 ( x, str(1:12), 3 ) 
      write (*,*) 'es :',str(1:12), x
!
   end subroutine test
   
   subroutine write_val_e4 (val, str, n)
!
!  writes -3.04E+01
!
     real*4    :: val         !  value to write; must fit
     integer*4 :: n           !  digits >= 0 and < len(str)
     character :: str*(*)
!
     integer*4 :: l           ! len ( str )
     real*4    :: rv,av       ! abs ( val )
     real*4    :: power       ! log10 ( val )
     integer*4 :: ip          ! E+ip
     integer*8 :: v           ! integer for digits of val
     integer*8 :: ten = 10    ! mod
     integer*4 :: k           ! position of digit
!     integer*4 :: p = 3       ! position of '.'
     integer*4 :: sgn         ! +/-
     integer*4 :: d           ! digit
     integer*4 :: z = ichar ('0')
     integer*4 :: i,m
!
     REAL     QLG10
     external QLG10
!
!  Remove sign
     if ( val > 0 ) then
       sgn = 1
       av  = val
     else if ( val < 0 ) then
       sgn = -1
       av  = -val
     else
       str = ' 0.'
       return
     end if
!
!  Check for overflow
     m   = max (n,1)                !  digits to provide
     l   = len (str)
     if ( m > l-7 ) goto 99
!zzz
!     write (*,*) 'val   =',val
!     write (*,*) 'av    =',av
!     write (*,*) 'log10 =', log10(av), QLG10 (av)
!
!  Determine power
     power = QLG10 (av)
      ip = power
      if ( power < 0) ip = (power - .9999999d0)
!zzz
!     write (*,*) 'power =',power
!     write (*,*) 'ip    =',ip
!
!  Apply rounding
 10  rv  = av * 10.0**(m-ip) + 0.5
!
!  generate digits
     v   = rv
     str = ' '
     do k = 3+m,3+1,-1
       d = mod(v,ten)
       str(k:k) = char (d+z)
       v = v/10
     end do
!
!  check power
     str(3:3) = '.'
     if ( v > 9 ) then          !  patch for round up power
       write (*,*) 'Power increase identified :', str(1:3+m), ip
       ip = ip+1
       goto 10
     else if ( v < 1 ) then     !  patch for round down power
       write (*,*) 'Power decrease identified :', str(1:3+m), ip
       ip = ip-1
       goto 10
     else
       d = v
       str(2:2) = char (d+z)
     end if
!
!  -ve values
     if ( sgn < 0 ) str(1:1) = '-'
!
!  write power
     k = 3+m+2
     if ( ip < 0 ) then
       v = abs(ip)
       str(k-1:k) = 'E-'
     else
       v = ip
       str(k-1:k) = 'E+'
     end if
!
     m = 2
     if ( v > 99 ) m = 3
     do i = k+m,k+1,-1
       d = mod(v,ten)
       str(i:i) = char (d+z)
       v = v/10
     end do
     return
!
!  overflow field
 99  str = repeat ('#', l)
     return

   end subroutine write_val_e4
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 609

PostPosted: Tue Nov 22, 2016 5:59 am    Post subject: Reply with quote

You can reduce the number of instances (where a correction is needed) by increasing n to, say, 12, and TBLSIZ to 2^12 = 4096. You could, in addition, revert to using the full-precision log10() function in the few instances in which you detect that the estimated exponent needs correction.

Here is an improved version of QLG10 in which, after the table lookup, instead of taking the closest lower value, linear interpolation is applied to the two bracketing values. With this change, I find that 556 exponent updates were needed for 10 million random numbers between -1E5 to 1E5. If the full precision log10() is used, the number of exponent updates becomes 548. Thus, your exponent checks and updates are needed to be present in the code (even when you use log10()), but the performance hit is minor because such updates may be rarely invoked.
Code:
real function qlg10l(x)
!
! quick evaluation of log10(x) with 8 bit precision, for use in finding
! integer part of log10(x) to use with E and G format output processing
!
! See http://www.icsi.berkeley.edu/pubs/techreports/TR-07-002.pdf
! In this version, after the table is looked up linear interpolation is
! performed to improve the approximation
!
implicit none
integer, parameter :: n = 8
integer, parameter :: tblsiz = 257, mbits = 23   ! 2^n+1, mantissa bits
real, intent(in) :: x
real, dimension(tblsiz),save :: tblx,tbly
real, dimension(tblsiz-1),save :: cm
logical :: beg = .true.
integer :: i, ix, lg2, y, z, mask, emask
real rz
data mask/Z'7FFFFF'/,emask/Z'3F800000'/
!
if(beg)then
   beg = .false.
   do i=1,tblsiz
        tblx(i) = (i-1)/real(tblsiz-1) + 1.0
        tbly(i) = LOG10(tblx(i))
   end do
   do i=1,tblsiz-1
     cm(i)   = (tbly(i+1)-tbly(i))/(tblx(i+1)-tblx(i))
   end do
endif
ix=transfer(x,ix)
lg2=iand(ishft(ix,-mbits),255) - 127  ! exponent
y=iand(ix,mask); z=ior(y,emask); rz=transfer(z,rz)
y=ishft(y,n-mbits)    ! 8-bit mantissa
qlg10l=0.30103*lg2+tbly(y+1)+cm(y+1)*(rz-tblx(y+1))
!write(*,10)y,rz,tblx(y+1),tbly(y+1),tblx(y+2),tbly(y+2),qlg10
!10 format(1x,i4,5F10.4,F14.4)
return
end function qlg10l


Last edited by mecej4 on Tue Nov 22, 2016 2:57 pm; edited 4 times in total
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> General All times are GMT + 1 Hour
Goto page Previous  1, 2, 3, 4, 5, 6  Next
Page 5 of 6

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group