|
forums.silverfrost.com Welcome to the Silverfrost forums
|
View previous topic :: View next topic |
Author |
Message |
DanRRight
Joined: 10 Mar 2008 Posts: 2828 Location: South Pole, Antarctica
|
Posted: Mon Nov 14, 2016 12:06 am Post subject: |
|
|
Well, good to know that 64bit compiler is faster on reading numbers, but...i can not use 64 bit compiler with currently not working well pre-beta debugger, this will kill me
Also, John, can you please reduce tests to 1,2,4,8,16 and make the log file less verbose and better visible so that we can post it here without much cutting (this forum is keeping to be way too restricting in post size despite over forum existence time the size and speed of harddrives increased probably by 10-100x and price per bit dropped by 1000x). And also automatically delete all files after their use
Here is SSD Samsung 850 Pro test with I7 CPU
Code: | Opening File_1gb.txt : iostat= 0 1.000000E-04
1 write_text 292.722 mb/sec 3.49820 13421774
Opening File_2gb.txt : iostat= 0 1.000000E-04
2 write_text 307.517 mb/sec 6.65980 26843547
Opening File_4gb.txt : iostat= 0 2.000000E-04
4 write_text 329.173 mb/sec 12.4433 53687093
Opening File_6gb.txt : iostat= 0 1.000000E-04
6 write_text 322.171 mb/sec 19.0706 80530641
Opening File_8gb.txt : iostat= 0 6.000000E-04
8 write_text 322.576 mb/sec 25.3956 107374185
Opening File_10gb.txt : iostat= 0 1.000000E-04
10 write_text 338.099 mb/sec 30.2870 134217729
Opening File_14gb.txt : iostat= 0 4.000000E-04
14 write_text 336.274 mb/sec 42.6319 187904817
Opening File_18gb.txt : iostat= 0 1.000000E-04
18 write_text 336.538 mb/sec 54.7694 241591905
Opening Number_1gb.txt : iostat= 0 2.000000E-04
1 write_numb 38.1252 mb/sec 26.8589 13421774
Opening Number_2gb.txt : iostat= 0 2.000000E-04
2 write_numb 38.1325 mb/sec 53.7075 26843547
Opening Number_4gb.txt : iostat= 0 1.000000E-04
4 write_numb 38.1490 mb/sec 107.368 53687093
Opening Number_6gb.txt : iostat= 0 1.000000E-04
6 write_numb 38.1250 mb/sec 161.154 80530641
Opening Number_8gb.txt : iostat= 0 1.000000E-04
8 write_numb 38.1276 mb/sec 214.857 107374185
Opening Number_10gb.txt : iostat= 0 1.000000E-04
10 write_numb 38.1685 mb/sec 268.284 134217729
Opening Number_14gb.txt : iostat= 0 2.000000E-04
14 write_numb 38.1640 mb/sec 375.642 187904817
Opening Number_18gb.txt : iostat= 0 6.000000E-04
18 write_numb 38.1899 mb/sec 482.641 241591905
|
Last edited by DanRRight on Mon Nov 14, 2016 2:28 pm; edited 1 time in total |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2828 Location: South Pole, Antarctica
|
Posted: Mon Nov 14, 2016 2:16 pm Post subject: |
|
|
Code: | Opening File_1gb.txt : iostat= 0 2.900000E-03
1 read_text 179.908 mb/sec 5.69180 1.00000 13421775
Opening File_2gb.txt : iostat= 0 3.100000E-03
2 read_text 185.824 mb/sec 11.0212 2.00000 26843548
Opening File_4gb.txt : iostat= 0 3.100000E-03
4 read_text 189.945 mb/sec 21.5641 4.00000 53687094
Opening File_6gb.txt : iostat= 0 3.200000E-03
6 read_text 192.135 mb/sec 31.9776 6.00000 80530642
Opening File_8gb.txt : iostat= 0 3.100000E-03
8 read_text 192.985 mb/sec 42.4490 8.00000 107374186
Opening File_10gb.txt : iostat= 0 3.600000E-03
10 read_text 192.612 mb/sec 53.1640 10.0000 134217730
Opening File_14gb.txt : iostat= 0 3.400000E-03
14 read_text 193.060 mb/sec 74.2567 14.0000 187904818
Opening File_18gb.txt : iostat= 0 3.300000E-03
18 read_text 192.805 mb/sec 95.5994 18.0000 241591906
Opening Number_1gb.txt : iostat= 0 3.600000E-03
1 read_numb 97.9811 mb/sec 10.4510 1.00000 13421775
Opening Number_2gb.txt : iostat= 0 2.700000E-03
2 read_numb 98.0087 mb/sec 20.8961 2.00000 26843548
Opening Number_4gb.txt : iostat= 0 2.900000E-03
4 read_numb 97.9989 mb/sec 41.7964 4.00000 53687094
Opening Number_6gb.txt : iostat= 0 3.100000E-03
6 read_numb 98.0025 mb/sec 62.6923 6.00000 80530642
Opening Number_8gb.txt : iostat= 0 3.000000E-03
8 read_numb 97.9513 mb/sec 83.6334 8.00000 107374186
Opening Number_10gb.txt : iostat= 0 3.200000E-03
10 read_numb 97.8757 mb/sec 104.622 10.0000 134217730
Opening Number_14gb.txt : iostat= 0 1.000000E-04
14 read_numb 97.6361 mb/sec 146.831 14.0000 187904818
Opening Number_18gb.txt : iostat= 0 1.000000E-04
18 read_numb 97.4608 mb/sec 189.122 18.0000 241591906
Opening Number_1gb.txt : iostat= 0 4.900000E-03
1 process_numb 143.411 mb/sec 7.14030 1.00000 13421774 0
Opening Number_2gb.txt : iostat= 0 4.400000E-03
2 process_numb 143.457 mb/sec 14.2761 2.00000 26843547 0
Opening Number_4gb.txt : iostat= 0 4.600000E-03
4 process_numb 143.546 mb/sec 28.5344 4.00000 53687093 0
Opening Number_6gb.txt : iostat= 0 4.400000E-03
6 process_numb 143.029 mb/sec 42.9562 6.00000 80530641 0
Opening Number_8gb.txt : iostat= 0 4.800000E-03
8 process_numb 143.353 mb/sec 57.1455 8.00000 107374185 0
Opening Number_10gb.txt : iostat= 0 4.800000E-03
10 process_numb 142.998 mb/sec 71.6094 10.0000 134217729 0
Opening Number_14gb.txt : iostat= 0 1.000000E-04
14 process_numb 143.269 mb/sec 100.063 14.0000 187904817 0
Opening Number_18gb.txt : iostat= 0 1.000000E-04
18 process_numb 143.530 mb/sec 128.419 18.0000 241591905 0
|
|
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2560 Location: Sydney
|
Posted: Mon Nov 14, 2016 11:57 pm Post subject: |
|
|
Dan,
Looks similar to what I am getting. Try ftn95 /64
John |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2828 Location: South Pole, Antarctica
|
Posted: Tue Nov 15, 2016 3:07 am Post subject: |
|
|
I tried 64 but scaled back to pure 32, some unknown and extremely rare problem made code to behave crazily (I call it devilry). Could also be not 64 (when I do not reboot computer for very long time) but I had no time to investigate, so I uninstalled and rebooted. I am waiting for 64bit debugger to be at least elementary functional.
Can you run the test on Intel or Lahey Fortran? |
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1892
|
Posted: Tue Nov 15, 2016 11:02 am Post subject: |
|
|
As you have already seen, formatted reads and writes of real numbers are expensive. Here are a pair of simple tests that involve almost no external I/O and make the point. No need to consider SSDs, Ramdisks, etc.
In the first example, I use a string variable containing one line similar to the lines generated in John's tests. I then read the ten real numbers using 10F10.3 as the format. I repeat this READ 1 million times, and write to the console after 100000, 200000, etc., 1000000 iterations. This WRITE is necessary to prevent the optimization phase of the compiler from making the whole loop vanish.
Code: | program FmtRead
implicit none
character(len=100) :: str
integer :: i,j
real, dimension(10) :: x
write(str,'(10F10.3)')(3.0*i-7.679,i=1,10)
do j=1,1000000
read(str,'(10F10.3)')x
if(mod(j,100000).eq.0)write(*,*)j,x(3)
end do
end program
|
This program, compiled with gFortran -O2, runs in 3 seconds on a cloud Linux server.
The second program is a modified version of the first, in which the internal READ is replaced by code that directly interprets the string in Fortran code, without going through the Fortran I/O runtime. It does not check for invalid characters, missing decimal points and other similar errors -- the input string is assumed to be valid for the format in question.
Code: | program IntlRead
implicit none
character(len=100) :: str
integer :: i,j,k,n,sgn,iv,fv,tp
real, dimension(10) :: x
write(str,'(10F10.3)')(-7.679+3*i,i=1,10)
do j=1,1000000
do n=1,10
k=(n-1)*10+1
sgn=1
do while(str(k:k) == ' ')
k=k+1
end do
if(str(k:k) == '-')then
sgn=-1
k=k+1
endif
iv=0
do while(str(k:k) /= '.')
iv=iv*10+ichar(str(k:k))-ichar('0')
k=k+1
end do
k=k+1; fv=0; tp=1
do while(str(k:k) >= '0' .and. k <= n*10)
fv=fv*10+ichar(str(k:k))-ichar('0')
tp=tp*10
k=k+1
end do
x(n)=sgn*(real(iv)+real(fv)/real(tp))
end do
if(mod(j,100000).eq.0)write(*,*)j,x(3)
end do
end program
|
On the same system, the modified code runs in 0.3 s. Thus, the convenience of formatted input can carry a hefty price tag. In a real program, therefore, it is worthwhile to minimize formatted I/O as much as possible. If the same file is read several times in the program, it may be advantageous to put the read data into memory during the first file reading and replace the subsequent file reads by memory copying.
Last edited by mecej4 on Tue Nov 15, 2016 1:35 pm; edited 1 time in total |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2828 Location: South Pole, Antarctica
|
Posted: Tue Nov 15, 2016 11:48 am Post subject: |
|
|
Yea, something like that for example. Or ultimately optimize something on assembler level.
I would expect Salford/Silverfrost to implement some kind of "superfast read/write" utilities with some clearly defined restrictions to reach the peak speeds. It is pity to lose 99% of 20GB/s I/O bandwidth of modern hardware like PCI Express SSDs or DDR4 RAMdrives.
I'm still busy to check that myself (soon my colleagues will demand my body parts if I do not deliver in few more days what I promised), but may be you or somebody else have time to check unformatted read speed? |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2560 Location: Sydney
|
Posted: Tue Nov 15, 2016 12:14 pm Post subject: |
|
|
mecej4,
I would like to draw attention to the poor performance of WRITE with /64 (and gFortran).
I have a variant of your program that tests internal write, rather than read, as write with gFortran or FTN95 /64 is much slower than ftn95 (/32).
Write is also slower than read.
(Interesting read is faster with FTN95 /64 ?)
I have obtained the following performance using PLATO on my 2.6ghz i5-2300:
program fmtread
ftn95 (Release Win32) : 1.083 seconds
ftn95 /64 : 0.668 seconds
gFortran : 6.157 seconds
program fmtwrite
ftn95 (Release Win32) : 3.036 seconds
ftn95 /64 : 9.28 seconds
gFortran : 40.07 seconds ( I'm consistently getting 10x performance like this !!)
Code: | program FmtWrite
implicit none
character(len=100) :: str
integer :: i,j
real, dimension(10) :: x
real del_sec, sec
external del_sec
!
sec = del_sec ()
do i = 1,10
x(i) = 3.0*i-7.679
end do
do j=1,1000000
x(3) = 3.0/j-7.679
write (str,'(10F10.3)') x
if(mod(j,100000).eq.0)write(*,*)j,x(3)
end do
sec = del_sec ()
write (*,*) sec,' seconds'
end program
real*4 function del_sec ()
!
integer*8 :: last_tick = 0
integer*8 :: tick, rate
real*4 :: dt
!
call system_clock ( tick, rate )
dt = real(tick-last_tick) / real(rate)
last_tick = tick
del_sec = dt
end function del_sec |
Lately, I have put in a lot of work to improve the performance of Finite Element linear equation solution in a 64-bit environment, only to find that the speed improvement obtained is lost when reporting the results to a text file. |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2828 Location: South Pole, Antarctica
|
Posted: Tue Nov 15, 2016 12:34 pm Post subject: |
|
|
John,
Can you add to your test read text + direct interpretation of the string like mecej4 showed? Numbers though have to have, say, 10 digits, in your example they are way too short
How is it faster then formatted read or text read + internal read?
Last edited by DanRRight on Tue Nov 15, 2016 3:07 pm; edited 1 time in total |
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1892
|
Posted: Tue Nov 15, 2016 1:17 pm Post subject: |
|
|
John, I think that the unacceptably slow formatted WRITE (and READ, but less serious in this case) performance of gFortran is attributable to the emulation layer (Cygwin, MinGW, etc.) used on Windows. Try running on a Linux system, or use a cloud service such as https://www.tutorialspoint.com/compile_fortran_online.php . (If you use this server, you have to replace "integer*8" by "integer" and "real*4" by "real". Ignore the IDE they provide and compile in the command line pane to specify -O2 as the option. I get a run time of 5 s; as I reported above, the formatted read program on this server took 3 s.)
From past experience, I think that the GFortran people have very little interest in fixing problems on Windows. Personally, I have other Fortran compilers available to me on Windows, so I am not bothered much by GFortran problems on that platform.
There is some logical basis for expecting WRITE to be faster than READ. After the WRITE has been initiated, the program can continue to execute as long as the I/O unit concerned is not touched until later. READ, on the other hand, must complete before the next statements are executed because the compiler probably cannot know when the read data is going to be used.
On the other hand, most storage devices are slower at writes than reads. |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2828 Location: South Pole, Antarctica
|
Posted: Wed Nov 16, 2016 3:22 am Post subject: |
|
|
Mecej4, i just noticed that your simple code example for internal read works only for real*4 (not real8) F formatted numbers and also does not convert E formatted numbers. If you add what is missing will it be faster then standard formatted read like READ(11,'(10e10.3)') X ?
Also, anyone knows FTN95 library or WinAPI function to find the file size without opening it or deciphering DIR command prompt call? |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2560 Location: Sydney
|
Posted: Wed Nov 16, 2016 5:46 am Post subject: |
|
|
Dan,
Try subroutine file_size8@ (character*(*):File_name, real*8:size, integer*2:error_code)
I have now tested both internal read and write, based on mecej4's approach. The performance times are:
Code: | fmtread
ftn95 /opt /link : 1.126 seconds
ftn95 /64 /link : 0.698 seconds
gfortran /o2 : 6.238 seconds ##
intlread
ftn95 /opt /link : 0.229 seconds
ftn95 /64 /link : 0.410 seconds
gfortran /o2 : 0.260 seconds
fmtwrite
ftn95 /opt /link : 3.120 seconds
ftn95 /64 /link : 9.457 seconds ##
gfortran /o2 : 40.322 seconds ##
intlwrite
ftn95 /opt /link : 1.513 seconds
ftn95 /64 /link : 1.803 seconds
gfortran /o2 : 0.286 seconds |
These tests show that formatted write with ftn95 /64 is 3x slower than 32 bit and also 5x slower than a user written function. It would be good if this could be reviewed.
gFortran 64bit formatted write is worse, being 140x slower than a user written function, with little chance of a review there.
the write test is: Code: | program IntlWrite
implicit none
character(len=100) :: str
integer :: i,j,n
real, dimension(10) :: x
real del_sec, sec
external del_sec
!
! Initialise vector
sec = del_sec (0)
do i = 1,10
x(i) = 3.0*i-7.679
end do
write (*,'(10F10.3)')(-7.679+3*i,i=1,10)
!
! Formatted write
sec = del_sec (0)
do j=1,1000000
x(3) = 3.0/j-7.679
write (str,'(10F10.3)') x
if (mod(j,100000).eq.0) write (*,*) del_sec (-1), j, x(3)
end do
sec = del_sec (0)
write (*,*) sec,' seconds : format write'
!
x(4) = -.00025
x(5) = 0
x(6) = .00065
!
do j=1,1000000
x(3) = (3.0/j-.07679)
do n=1,10
call write_val_r4 ( x(n), str(n*10-9:n*10), 3 )
end do
if (mod(j,100000).eq.0) write(*,*) del_sec (-1), j, x(3)
end do
sec = del_sec (0)
write (*,*) sec,' seconds : function write'
write (*,*) str
!
end program
real*4 function del_sec (update)
!
integer*4 :: update
integer*8 :: last_tick = 0
integer*8 :: tick, rate
real*4 :: dt
!
call system_clock ( tick, rate )
dt = real(tick-last_tick) / real(rate)
if ( update >= 0 ) last_tick = tick
del_sec = dt
end function del_sec
|
|
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2560 Location: Sydney
|
Posted: Wed Nov 16, 2016 5:49 am Post subject: |
|
|
My adaptation of a write subroutine is Code: | subroutine write_val_r4 (val, str, n)
!
! writes -3.04
!
real*4 :: val ! value to write; must fit
integer*4 :: n ! digits >= 0 and < len(str)
character :: str*(*)
!
real*4 :: rv ! abs ( val)
integer*8 :: v ! integer for digits of val
integer*8 :: ten = 10 ! mod
integer*4 :: k ! position of digit
integer*4 :: p ! position of '.'
integer*4 :: sgn ! +/-
integer*4 :: d ! digit
integer*4 :: z = ichar ('0')
!
k = len (str)
p = k-n
if ( p < 1 ) goto 99
str = ' '
!
if ( val > 0 ) then
sgn = 1
rv = val
else if ( val < 0 ) then
sgn = -1
rv = -val
else
str(p-1:p) = '0.'
return
end if
!
! Integer of digits
if (n > 0 ) then
v = ( rv * 10**n + 0.5 )
else
v = ( rv + 0.5 )
end if
!
! generate digits
str(p:p) = '.'
do
if ( k==p ) k = k-1
d = mod(v,ten)
if ( k < 1 ) goto 99
str(k:k) = char (d+z)
v = v/10
k = k-1
if ( v == 0 .and. k < p ) exit
end do
!
! -ve values
if ( sgn < 0 ) then
if ( k < 1 ) goto 99
str(k:k) = '-'
end if
return
!
! overflow field
99 str = repeat ('#', len(str))
return
end subroutine write_val_r4
|
|
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1892
|
Posted: Wed Nov 16, 2016 5:57 am Post subject: Re: |
|
|
DanRRight wrote: | Mecej4, i just noticed that your simple code example for internal read works only for real*4 (not real8) F formatted numbers and also does not convert E formatted numbers. If you add what is missing will it be faster then standard formatted read like READ(11,'(10e10.3)') X ? |
That was intentional. The more general the READ format, the more processing will be required. If you add what is missing, you will probably have duplicated what is already in the Fortran I/O library functions, and the run time will be longer.
Quote: | Also, anyone knows FTN95 library or WinAPI function to find the file size without opening it or deciphering DIR command prompt call? |
Other Fortran compilers support INQUIRE(FILE=filename, SIZE=file_size), but FTN95 does not yet do so, and provides a non-standard subroutine FILE_SIZE@() for this purpose. |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2828 Location: South Pole, Antarctica
|
Posted: Wed Nov 16, 2016 6:17 am Post subject: |
|
|
Thanks John and Mecej4.
I'd keep my eyes on making user format conversion subroutines more general, to allow real*8 F and E formats. Hopefully this will be done in processor L1 cache and will not add substantial processing time to way slower reading/writing process. May be for F and E/D formats separately if this generalization will harm the speed.
Interesting is also: how about speed of unformatted read/write ?
By the way i tried FTN95 /64 and got speed on John's original test almost 500 MB/second on write text and almost 200 MB/second on read numbers both on SSD. Still did not try to run test on RAMdrive because this requires reboot.
Read text though gave me ...0.8 mb/sec (!!!) Yes, a megabyte per second. Some bug probably |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2560 Location: Sydney
|
Posted: Wed Nov 16, 2016 10:43 am Post subject: |
|
|
I have adapted the F format routine and now have an ES format routine. The test results are:
Code: | ftn95 /opt /link
F format : 3.038 sec F routine : 1.758 sec
ES format : 3.116 sec ES routine : 2.822 sec
ftn95 /64 /link
F format : 9.779 sec F routine : 1.456 sec
ES format : 9.166 sec ES routine : 2.251 sec
gFortran -O2
F format : 39.97 sec F routine : 0.283 sec
ES format : 70.50 sec ES routine : 0.973 sec |
These results show that the /64 F and ES formats are very slow. Is it possible to review this performance.
(The gFortran are extremely poor / unacceptably slow !! These are for internal writes ?)
The additional routines are: Code: | program IntlWrite
implicit none
character(len=100) :: str
integer :: i,j,n
real, dimension(10) :: x
real del_sec, sec
external del_sec
!
! Initialise vector
sec = del_sec (0)
do i = 1,10
x(i) = 3.0*i-7.679
end do
write (*,'(10F10.3)')(-7.679+3*i,i=1,10)
!
! Formatted write
sec = del_sec (0)
do j=1,1000000
x(3) = 3.0/j-.07679
write (str,'(10F10.3)') x
if (mod(j,100000).eq.0) write (*,*) del_sec (-1), j, x(3)
end do
sec = del_sec (0)
write (*,*) sec,' seconds : format F write'
write (*,*) str
!
! new values to test ES
x(4) = -.00025
x(5) = 0
x(6) = .00065
!
! function write
do j=1,1000000
x(3) = 3.0/j-.07679
do n=1,10
call write_val_r4 ( x(n), str(n*10-9:n*10), 3 )
end do
if (mod(j,100000).eq.0) write(*,*) del_sec (-1), j, x(3)
end do
sec = del_sec (0)
write (*,*) sec,' seconds : function F write'
write (*,*) str
!
! Formatted write
do j=1,1000000
x(3) = 3.0/j-.07679
write (str,'(10ES10.3)') x
if (mod(j,100000).eq.0) write (*,*) del_sec (-1), j, x(3)
end do
sec = del_sec (0)
write (*,*) sec,' seconds : format ES write'
write (*,*) str
!
do j=1,1000000
x(3) = 3.0/j-.07679
do n=1,10
call write_val_e4 ( x(n), str(n*10-9:n*10), 3 )
end do
if (mod(j,100000).eq.0) write(*,*) del_sec (-1), j, x(3)
end do
sec = del_sec (0)
write (*,*) sec,' seconds : function ES write'
write (*,*) str
!
end program |
|
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|