|
forums.silverfrost.com Welcome to the Silverfrost forums
|
View previous topic :: View next topic |
Author |
Message |
DanRRight
Joined: 10 Mar 2008 Posts: 2828 Location: South Pole, Antarctica
|
Posted: Thu Nov 17, 2016 1:54 pm Post subject: |
|
|
Well, unformatted read is probably the only good solution until someone will break the sound barrier in speed with formatted read of text files.
i will probably meantime transform all the files which have no problems in them into binary and then read unformatted way. This of course will take time for conversion (will be done in batch regime or overnight) but this will be compensated by 10x speedup when loading them. |
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1892
|
Posted: Thu Nov 17, 2016 2:30 pm Post subject: |
|
|
Do note that FTN95 uses a different convention for record markers for Fortran unformatted files than most other current Fortran compilers, and your C programmers may be unfamiliar with Fortran unformatted files. You can search for messages in the Silverfrost forums about conversion from the common unformatted file format to the Silverfrost format.
Not a big issue, but something to be aware of. |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2828 Location: South Pole, Antarctica
|
Posted: Thu Nov 17, 2016 7:49 pm Post subject: |
|
|
Got so far 3.5x speed increase on unformatted read with real code versus 10-15x on simple test benches. All that versus my older formatted *-format specifier read. This is because besides loading there are a lot of processing with LOG, EXP and large arrays in my code go simultaneously...Binary files are 3x more compact by the way then my older text files
And tests show that when 64 will be finally ready the read speed will be additionally 1.8x faster
Can you guys please do independent check for unformatted read speed ? We have chance to break 1GB/second on RAMdrives |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2560 Location: Sydney
|
Posted: Fri Nov 18, 2016 12:19 am Post subject: |
|
|
dan,
Quote: | We have chance to break 1GB/second on RAMdrives |
I am getting these high rates when the file is cached, typically with a binary I/O test of write then read. If you are opening a file on disk for reading, which has not been previously loaded into a disk cache in available memory, these speeds may not be possible.
While binary files have the advantage of speed, text based files from external sources have the advantage of easy verification and adaptation of a free-format reader. My recent testing has shown these can be read at 60 to 100 mb/sec, which is not too bad.
The alternative is you could write a converter that reads the text file and produces a binary format for later quicker analysis, although this involves duplication of the information. The advantage of this is the converter can also be a data reviewer, that reports likely errors and simple statistics of range and extremes, which I find essential.
You haven't indicated the quantity of data to be analysed, say gb's or terra-bytes per day. |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2828 Location: South Pole, Antarctica
|
Posted: Fri Nov 18, 2016 1:57 pm Post subject: |
|
|
John,
PCI-express SSDs have 3GB/s, we have to utilize this bandwidth
I am so sick and furious with slow formatted text read that i want to try anything. Unformatted read of binary data is of course very dangerous but hopefully we passed the learning curve when it was very hard to catch the error in huge data sets.
Right now we have to handle 10-100GB size files for one run. Coming soon 64-bit compiler will increase the loaded volumes by order of magnitude.
Last edited by DanRRight on Fri Nov 18, 2016 2:00 pm; edited 1 time in total |
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1892
|
Posted: Fri Nov 18, 2016 2:00 pm Post subject: |
|
|
John,
Here is Fortran code for a fast but approximate log10 function. You may try replacing the two calls to log10() in your E format output routine by calls to qlg10(). It computes an 8-bit approximation to log10(x), which is slightly better than two decimal digits. In the first call, a small table is set up. Each subsequent evaluation of qlg10() involves a single floating-multiply-add (FMA) operation and some integer shift and AND operations.
Code: | real function qlg10(x)
!
! quick evaluation of log10(x) with 8 bit precision, for use in finding
! integer part of log10(x) to use with E and G format output processing
!
! See http://www.icsi.berkeley.edu/pubs/techreports/TR-07-002.pdf
!
implicit none
integer, parameter :: n = 8
integer, parameter :: tblsiz = 256, mbits = 23 ! 2^n, mantissa bits
real, intent(in) :: x
real, dimension(tblsiz),save :: tbl
logical :: beg = .true.
integer :: i, ix, lg2, y, mask
data mask/Z'7FFFFF'/
!
if(beg)then
beg = .false.
do i=1,tblsiz
tbl(i) = LOG10((i-1)/real(tblsiz) + 1.0)
end do
endif
ix=transfer(x,ix)
lg2=iand(ishft(ix,-mbits),255) - 127 ! exponent
y=ishft(iand(ix,mask),n-mbits) ! 8-bit mantissa
qlg10=0.30103*lg2+tbl(y+1)
return
end function qlg10
|
|
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1892
|
Posted: Sun Nov 20, 2016 3:27 pm Post subject: Improved timings |
|
|
After some fine tuning of the E and F output formatting subroutines, I obtained the following timing results (W10, i7-2720QM).
Code: | F-fmt F-sub E-fmt E-sub
FTN95 32-bit 2.904 0.595 3.057 2.703
FTN95 64-bit 9.057 0.796 8.985 3.013
GFTN 6.2-32 50.940 0.359 85.160 2.578
GFTN 6.2-64 48.290 0.234 81.870 2.110
LGF 7.7-32 48.120 0.344 80.470 2.562
LGF 7.7-64 45.930 0.250 77.030 2.047
IFort 17-32 2.320 0.427 2.300 1.247
IFort 17-64 2.390 0.512 1.980 1.122
|
GFTN is Gfortran 6.2 from Equation.com. LGF is Lahey/GNU 7.7, which is a repackaging of Gfortran 5.4.1.
Last edited by mecej4 on Mon Nov 21, 2016 9:42 am; edited 3 times in total |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2828 Location: South Pole, Antarctica
|
Posted: Sun Nov 20, 2016 4:12 pm Post subject: |
|
|
Thanks mecej4, such wide comparisons tell a lot.
Was this for WRITE? How about READ?
And for completeness would be good to add also * format which is very good when there are different type variables of different widths in the line |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2560 Location: Sydney
|
Posted: Mon Nov 21, 2016 4:45 am Post subject: |
|
|
mecej4's summary of write performance gives a good indication of Fortran formatted write performance, especially,
FTN95 /32 is comparatively good for formatted write
FTN95 /64 is slow for Format Write
gFortran is incredibly slow for Format write.
The function approach is faster and may provide a use if there are specific changes to output, say for 0. or -0.000, or if using gFortran !
regarding READ performance, I have reviewed the internal read options, using a variation of mecej4's post on 15 Nov and Dan's free format layout of 16 Nov.
The test options I considered are:
1 read fixed format using 10F10.3
2 read fixed format using mecej4's read_val routine
3 read fixed format using read_val routine for optional decimal
4 read free format layout, using Dan's layout example, with numbers separated by a space or comma
5 parse string only, using Dan's layout example, with numbers separated by a space or comma
I have tested this for FTN95 /32, FTN95 /64 and gFortran 64-bit.
Code: | ftn95/32 ftn95/64 gfortran Test
1.102 0.665 6.331 1 Fortran FORMAT read
0.233 0.405 0.284 2 function read - fixed field length
0.272 0.499 0.328 3 function read - optional fraction value, fixed field length
0.622 1.033 0.755 4 function read - optional fraction value, variable field length
0.211 0.410 0.485 5 get_next_field only |
These results do not show the variation that the write test did show, although the parse string code (finding each number string) is surprisingly slow compared to the reading times.
My guess is that the test #4 approach does give greater flexibility reading number strings, but is about the same speed as FORMAT reads.
FTN95 and gFortran does offer comma support in the standard F format.
gFortran FORMAT read is again slow in comparison.
The advantage of the function approach for large data sets is you can change the valid number formats to suit the equipment that writes the file, eg tab or colon delimited fields, as well as space or comma can be supported.
The test program is on this link
https://www.dropbox.com/s/ut82pkdexc3si5h/intlRead.f90?dl=0
Neither of the functions listed support E format reading, although this would need to be included if required.
Based on this, and previous posts, I estimate that a read speed for external text files is of the order of 100 mb/sec for HDD and 300 mb/sec for SSD, which would be limited by the read conversion. This is faster than some of Dan's performance reports, which may be due to other disk type problems.
Unfortunately when reading large files, they are typically not buffered.
It has always been my preference to use a free format function approach, where I also provide other error and character checking reports. This is easiest using stream I/O. Checking and other field definitions will reduce read rates. At the end of reading, providing statistics of character counts etc is useful to give confidence about the reading process. eg counts of lines read, control characters, delimiters and fields per line always help.
John |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2828 Location: South Pole, Antarctica
|
Posted: Mon Nov 21, 2016 8:56 am Post subject: |
|
|
Thanks, John, that comparison was great. You both with mecej4 have made very nice insight into the Fortran I/O, made huge step forward in speeding it and pointed out the further areas of improvement ! |
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1892
|
Posted: Mon Nov 21, 2016 9:25 am Post subject: |
|
|
John: There is one improvement that can be made to your code for writing E-formatted numbers; it may yield a slight but noticeable improvement in speed.
After one has found log10(x), rounded and scaled x, the integer v expressed in the decimal scale is exactly m digits long. Therefore, the two IF tests in the DO loop following "working with" are uncecessary. The loop can be replaced by
Code: | do k=3+m,p+1,-1
str(k:k) = char (mod(v,10)+z)
v=v/10
end do
str(p-1:p-1) = char (v+z) |
if you remember that str(p:p) contains the decimal point character.
..............
Have you thought out how to apply what we have learned to develop a work-around for your FEA code? If you can segregate a few of the most CPU-cycle-eating WRITE statements, you could replace them with calls to subroutines that generate formatted strings and print the strings with WRITE (nn, '(A)') str statements. |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2560 Location: Sydney
|
Posted: Mon Nov 21, 2016 1:44 pm Post subject: |
|
|
mecej4,
Thanks for the further advice. I have modified the loops to remove some of the unnecessary instructions you indicated and have produced some significant improvements. The following table tracks the elapsed time performance of the ES Function for changes that have been discussed, including:
removing a LOG10 function then replacing the other with a quick LOG10 look-up function
cleaning the DO loops to remove unnecessary instructions
Code: | ES Function run time (seconds)
id FTN95 FTN95/64 gFortran Test
1 2.862 2.353 0.939 Initial ES Function
2 2.520 2.320 0.684 remove 1 x LOG10
3 2.417 2.042 0.640 restructure ** power
5 0.424 replace log10 with qlg10
6 2.278 2.353 0.423 use fortran Qlog10
7 1.886 2.062 0.302 tidy ES function |
FTN95 /32 and gFortran have produced anticipated performance improvements, while FTN95 /64 has shown some variability in gains.
The final function is: Code: | subroutine write_val_e4 (val, str, n)
!
! writes -3.04E+01
!
real*4 :: val ! value to write; must fit
integer*4 :: n ! digits >= 0 and < len(str)
character :: str*(*)
!
integer*4 :: l ! len ( str )
real*4 :: rv ! abs ( val )
real*4 :: power ! log10 ( val )
integer*4 :: ip ! E+ip
integer*8 :: v ! integer for digits of val
integer*8 :: ten = 10 ! mod
integer*4 :: k ! position of digit
! integer*4 :: p = 3 ! position of '.'
integer*4 :: sgn ! +/-
integer*4 :: d ! digit
integer*4 :: z = ichar ('0')
integer*4 :: i,m
!
REAL QLG10
external QLG10
!
! Remove sign
if ( val > 0 ) then
sgn = 1
rv = val
else if ( val < 0 ) then
sgn = -1
rv = -val
else
str = ' 0.'
return
end if
!
! Check for overflow
m = max (n,1) ! digits to provide
l = len (str)
if ( m > l-7 ) goto 99
!
! Determine power
power = QLG10 (rv) ; ip = power ; if ( power < 0) ip = (power - .9999999d0)
rv = rv * 10.0**(m-ip) + 0.5
!
! generate digits
! k = 3+m ! last digit position
v = rv ! digits
str = ' '
do k = 3+m,3+1,-1
d = mod(v,ten)
str(k:k) = char (d+z)
v = v/10
end do
!
str(3:3) = '.'
if ( v > 9 ) then
! patch for round up power
write (*,*) 'Power increase identified :', str(1:3+m), ip
str(2:2) = '1'
ip = ip+1
else
d = v
str(2:2) = char (d+z)
end if
!
! -ve values
if ( sgn < 0 ) str(1:1) = '-'
!
! write power
k = 3+m+2 !
if ( ip < 0 ) then
v = abs(ip)
str(k-1:k) = 'E-'
else
v = ip
str(k-1:k) = 'E+'
end if
!
m = 2
if ( v > 99 ) m = 3
do i = k+m,k+1,-1
d = mod(v,ten)
str(i:i) = char (d+z)
v = v/10
end do
return
!
! overflow field
99 str = repeat ('#', l)
return
end subroutine write_val_e4 |
It will be good to test the updated FTN95 /64 /OPT on this and my earlier post on Thu Nov 17, 2016, when it is available. |
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1892
|
Posted: Mon Nov 21, 2016 6:58 pm Post subject: |
|
|
A couple of observations.
1. In many borderline cases, your new routine is faithful to the "round to nearest or even" rule; the previous version does not always get the least significant digit right.
2. When the input number causes the "patch for round up power" to be applied, the result is not correct. Try, for example,
Code: | val = 1.000857E-1
call write_val_e4 (val,str,3)
print *,str
|
The printed value is 1.009E-01, which is missing the third '0' preceding the '9'.
Last edited by mecej4 on Thu Sep 26, 2019 6:15 am; edited 2 times in total |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2560 Location: Sydney
|
Posted: Mon Nov 21, 2016 11:00 pm Post subject: |
|
|
mecej4,
How do you find these numbers !
The error is due to the error with QLG10 estimating the wrong integer power. ( I was hoping the lookup table would have been correct at multiples of 10 )
I corrected the problem by making a more general response to the integer power "ip" being wrong. (could have an infinite loop with rounding ?)
John
Code: | program IntlWrite
call test ( 99.999 )
call test ( 1.000857e-1 )
call test ( 1.00857e-1 )
end
subroutine test ( x )
implicit none
character(len=100) :: str
real :: x
REAL QLG10
external QLG10
!
write (*,*) 'log', qlg10 (x), log10 (x), x
call write_val_e4 ( x, str(1:12), 3 )
write (*,*) 'es :',str(1:12), x
!
end subroutine test
subroutine write_val_e4 (val, str, n)
!
! writes -3.04E+01
!
real*4 :: val ! value to write; must fit
integer*4 :: n ! digits >= 0 and < len(str)
character :: str*(*)
!
integer*4 :: l ! len ( str )
real*4 :: rv,av ! abs ( val )
real*4 :: power ! log10 ( val )
integer*4 :: ip ! E+ip
integer*8 :: v ! integer for digits of val
integer*8 :: ten = 10 ! mod
integer*4 :: k ! position of digit
! integer*4 :: p = 3 ! position of '.'
integer*4 :: sgn ! +/-
integer*4 :: d ! digit
integer*4 :: z = ichar ('0')
integer*4 :: i,m
!
REAL QLG10
external QLG10
!
! Remove sign
if ( val > 0 ) then
sgn = 1
av = val
else if ( val < 0 ) then
sgn = -1
av = -val
else
str = ' 0.'
return
end if
!
! Check for overflow
m = max (n,1) ! digits to provide
l = len (str)
if ( m > l-7 ) goto 99
!zzz
! write (*,*) 'val =',val
! write (*,*) 'av =',av
! write (*,*) 'log10 =', log10(av), QLG10 (av)
!
! Determine power
power = QLG10 (av)
ip = power
if ( power < 0) ip = (power - .9999999d0)
!zzz
! write (*,*) 'power =',power
! write (*,*) 'ip =',ip
!
! Apply rounding
10 rv = av * 10.0**(m-ip) + 0.5
!
! generate digits
v = rv
str = ' '
do k = 3+m,3+1,-1
d = mod(v,ten)
str(k:k) = char (d+z)
v = v/10
end do
!
! check power
str(3:3) = '.'
if ( v > 9 ) then ! patch for round up power
write (*,*) 'Power increase identified :', str(1:3+m), ip
ip = ip+1
goto 10
else if ( v < 1 ) then ! patch for round down power
write (*,*) 'Power decrease identified :', str(1:3+m), ip
ip = ip-1
goto 10
else
d = v
str(2:2) = char (d+z)
end if
!
! -ve values
if ( sgn < 0 ) str(1:1) = '-'
!
! write power
k = 3+m+2
if ( ip < 0 ) then
v = abs(ip)
str(k-1:k) = 'E-'
else
v = ip
str(k-1:k) = 'E+'
end if
!
m = 2
if ( v > 99 ) m = 3
do i = k+m,k+1,-1
d = mod(v,ten)
str(i:i) = char (d+z)
v = v/10
end do
return
!
! overflow field
99 str = repeat ('#', l)
return
end subroutine write_val_e4
|
|
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1892
|
Posted: Tue Nov 22, 2016 5:59 am Post subject: |
|
|
You can reduce the number of instances (where a correction is needed) by increasing n to, say, 12, and TBLSIZ to 2^12 = 4096. You could, in addition, revert to using the full-precision log10() function in the few instances in which you detect that the estimated exponent needs correction.
Here is an improved version of QLG10 in which, after the table lookup, instead of taking the closest lower value, linear interpolation is applied to the two bracketing values. With this change, I find that 556 exponent updates were needed for 10 million random numbers between -1E5 to 1E5. If the full precision log10() is used, the number of exponent updates becomes 548. Thus, your exponent checks and updates are needed to be present in the code (even when you use log10()), but the performance hit is minor because such updates may be rarely invoked.
Code: | real function qlg10l(x)
!
! quick evaluation of log10(x) with 8 bit precision, for use in finding
! integer part of log10(x) to use with E and G format output processing
!
! See http://www.icsi.berkeley.edu/pubs/techreports/TR-07-002.pdf
! In this version, after the table is looked up linear interpolation is
! performed to improve the approximation
!
implicit none
integer, parameter :: n = 8
integer, parameter :: tblsiz = 257, mbits = 23 ! 2^n+1, mantissa bits
real, intent(in) :: x
real, dimension(tblsiz),save :: tblx,tbly
real, dimension(tblsiz-1),save :: cm
logical :: beg = .true.
integer :: i, ix, lg2, y, z, mask, emask
real rz
data mask/Z'7FFFFF'/,emask/Z'3F800000'/
!
if(beg)then
beg = .false.
do i=1,tblsiz
tblx(i) = (i-1)/real(tblsiz-1) + 1.0
tbly(i) = LOG10(tblx(i))
end do
do i=1,tblsiz-1
cm(i) = (tbly(i+1)-tbly(i))/(tblx(i+1)-tblx(i))
end do
endif
ix=transfer(x,ix)
lg2=iand(ishft(ix,-mbits),255) - 127 ! exponent
y=iand(ix,mask); z=ior(y,emask); rz=transfer(z,rz)
y=ishft(y,n-mbits) ! 8-bit mantissa
qlg10l=0.30103*lg2+tbly(y+1)+cm(y+1)*(rz-tblx(y+1))
!write(*,10)y,rz,tblx(y+1),tbly(y+1),tblx(y+2),tbly(y+2),qlg10
!10 format(1x,i4,5F10.4,F14.4)
return
end function qlg10l |
Last edited by mecej4 on Tue Nov 22, 2016 2:57 pm; edited 4 times in total |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|