|
forums.silverfrost.com Welcome to the Silverfrost forums
|
View previous topic :: View next topic |
Author |
Message |
JohnCampbell
Joined: 16 Feb 2006 Posts: 2560 Location: Sydney
|
Posted: Wed Nov 16, 2016 10:46 am Post subject: |
|
|
ctd Code: | subroutine write_val_e4 (val, str, n)
!
! writes -3.04E+01
!
real*4 :: val ! value to write; must fit
integer*4 :: n ! digits >= 0 and < len(str)
character :: str*(*)
!
integer*4 :: l ! len ( str )
real*8 :: rv ! abs ( val )
real*8 :: power ! log10 ( val )
real*8 :: round ! round-off
integer*4 :: ip ! E+ip
integer*8 :: v ! integer for digits of val
integer*8 :: ten = 10 ! mod
integer*4 :: k ! position of digit
integer*4 :: p ! position of '.'
integer*4 :: sgn ! +/-
integer*4 :: d ! digit
integer*4 :: z = ichar ('0')
integer*4 :: i,m
!
! Remove sign
if ( val > 0 ) then
sgn = 1
rv = val
else if ( val < 0 ) then
sgn = -1
rv = -val
else
str = ' 0.'
return
end if
!
! Check for overflow
m = max (n,1) ! digits to provide
l = len (str)
p = 3
if ( m > l-7 ) goto 99
str = ' '
!
! Determine power
power = log10 (rv) ; ip = power ; if ( power < 0) ip = (power - .9999999d0)
round = 0.5*10.0d0**(ip-m)
rv = rv + round
!zz write (*,*) 'rv changed from',abs(val),' to',rv, ' with',round, ip
! check for 9.99999 + .00005
power = log10 (rv) ; ip = power ; if ( power < 0) ip = (power - .9999999d0)
rv = rv * 10.0d0**(m-ip)
!
! generate digits
str(p:p) = '.'
k = 3+m ! last digit position
v = rv ! digits
!zz write (*,*) 'working with',v,ip
do
if ( k==p ) k = k-1
d = mod(v,ten)
if ( k < 1 ) goto 99
str(k:k) = char (d+z)
v = v/10
k = k-1
if ( v == 0 .and. k < p ) exit
end do
!
! -ve values
if ( sgn < 0 ) then
if ( k < 1 ) goto 99
str(k:k) = '-'
end if
!
! write power
k = 3+m+2 !
if ( ip < 0 ) then
v = abs(ip)
str(k-1:k) = 'E-'
else
v = ip
str(k-1:k) = 'E+'
end if
!
m = 2
if ( v > 99 ) m = 3
k = k+m
do i = 1,m
d = mod(v,ten)
str(k:k) = char (d+z)
v = v/10
k = k-1
end do
return
!
! overflow field
99 str = repeat ('#', l)
return
end subroutine write_val_e4
{batch file to test}
del %1.exe
ftn95 %1 /opt /link
%1
del %1.exe
ftn95 %1 /64 /link
%1
del %1.exe
gfortran %1.f90 -O2 -o %1.exe
%1
|
Paul,
Is it possible to review the FTN95 /64 performance for formatted write ?
John |
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1892
|
Posted: Wed Nov 16, 2016 11:45 am Post subject: |
|
|
A round of applause to John for writing the last chapter of the story and for writing the ES output routines.
As it has evolved over a week, the content of this thread is now showing some sprawl. It covers two sets of tests, both of which make a case for reviewing the I/O routines of FTN95 and Gfortran on Windows. The older tests required resources that may not be readily available to everyone (over 100 GB of HDD/SSD space) and substantial run times. The new tests are much more accessible to anyone, and make the case more eloquently.
Some cleaning up, collecting the new tests (READ and WRITE, using built-in formatting/custom routines) into a compact Zip file posted at, say, Dropbox, would give Paul something easier to work with. I offer to run the cleaned up tests (John, please avoid using any FTN95-specific routines in the test codes and make up a Zip file of the source files) with Intel Fortran and Gfortran on Windows and Linux on an older dual boot desktop. |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2560 Location: Sydney
|
Posted: Wed Nov 16, 2016 12:40 pm Post subject: |
|
|
mecej4,
I shall assemble the tests and provide a dropbox link.
There are actually 3 sets of tests that I have been demonstrating.
1) Binary_IO test that demonstrated the performance being achieved with fixed length record random access files. This test uses 3 libraries; 2 are based on Fortran fixed length record random access files, while the 3rd used FTN95's new long 64-bit address file routines that I have used for the first time. This code is non-standard FTN95 code, although BinLib was an attempt at conforming Fortran. I shall provide this library.
These show the benefit of both SSD drives but probably more significantly the effect of memory cache buffering. I typically use these libraries to write then read information, so cacheing is important. Transfer rates in excess of 1 gb/sec were being demonstrated even using cached HDD, which is fairly good, especially for 32-bit solutions.
2) Text I/O with large files. This test tries to test file I/O which is non-buffered, for files up to 18 gb in size and 130 gb in total. This is what might be experienced when reading large .csv data sets supplied from an external source. This test is closest to Dan's identified problem.
Key outcome was that reading was fairly fast but write numbers is slow. This is portable code, except for IOMSG=message, which is not supported by FTN95.
3) Internal buffer read/write. This test identifies the slow write number and read number speeds for FTN95 /64 and also gFortran, both of which need improving. FTN95 /32 has relatively good performance. This test has not combined with file text string I/O, but hopefully test 2 confirms this is not a problem. Hopefully this is portable code.
In terms of identifying a problem that needs improving; test 3) is most useful. I shall prepare a link tomorrow (my time)
I am interested to see what performance can be achieved with other compilers. The other tests would be interesting, although they are dependent on many variables, such as disk type, installed memory, processor and I suspect operating system.
John |
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1892
|
Posted: Wed Nov 16, 2016 2:26 pm Post subject: |
|
|
I ran your WRITE tests using gFortran, and these are some comments related to that.
Your formatting routines write_val_r4() and write_val_e4() use INTEGER*8 variables. (write_val_e4() is not used in the test runs, but involves more use of INTEGER*8.) As a result, 32-bit EXEs produced by gFortran run significantly slower. Here is a comparison (on my laptop, i7-2720, W10).
lgf 7.7 (32) 52.3 0.806 -o3 -32 -wpo -sse
lgf 7.7 (64) 46.6 0.255 -o3 -64 -wpo -sse
Lahey/GNU/Marlette Fortran is a repackaging of gFortran 5.4.1. I have Cygwin versions of gFortran that are older, and they give similar results.
The implication is (i) see if you can reduce the use of INTEGER*8 variables in the formatting subroutines, and (ii) please clearly state whether your gFortran results are for 32-bit or 64-bit architecture.
I am pondering whether I should sound out an optimization expert whom I happen to know about whether he could look at this performance bug. He has written often about building GCC/GFortran on both Windows and Linux.
Another possibility is to post just some results to C.L.F. with a link to the source codes, and see if someone from the GFortran developers group gets interested.
First, however, we need distributable test codes and some Linux results.
Last edited by mecej4 on Wed Nov 16, 2016 2:47 pm; edited 1 time in total |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2828 Location: South Pole, Antarctica
|
Posted: Wed Nov 16, 2016 2:44 pm Post subject: |
|
|
John,
So you have increased slow WRITE speed of gFortran by 100+ times and even made it 5x faster then FTN95, wow! Please keep going with READ, including arbitrary F format length. The numbers I read are made by C guys, and may look like this with up to 20 digits long and may even not place decimal point like here
0.000777203 -0.000032224 -0.000039659 -0.365450860 -1.011079630 -0.520287431 -0.285002098 1355.429879314 1
which is absurd but this is usual life of C programmers.
Mecej4, if it will take you not more then few minutes can you please modify your real*4 code to read numbers like this 1355.429879314 ignoring extra digits? My brain can not multitask right now
And see this damn integer at the end of line in previous paragraph? I will try to handle it myself |
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1892
|
Posted: Wed Nov 16, 2016 5:17 pm Post subject: |
|
|
It would be easy to modify the code to read the string that you showed. The field widths are not uniform, perhaps not known in advance, and may even vary from record to record. All you have to do is to scan the string for white space, and build an index of where each number begins and ends, and a count of the number of fields.
What you must do before writing code to do this, however, is this: write down (or obtain from your C programmers) a specification of what is allowed (and, therefore, may be expected) in all the text lines that your program will have to read. For example: will some input lines have more or less than the nine fields in your sample record? Will some lines have comments following the numbers? Will some of the numbers have exponents specified?
Without such a specification, you will have a program that will work fine on this specific input line, but may malfunction (without crashing or any sign of error) with other input lines that are superficially similar to this one. And, as someone famous said, "premature optimization is the root of all evil".
Another simple possibility is to have the C program that writes the strings use a repeated %17.9e or some such format, instead of the %.9f format that it is now using. |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2560 Location: Sydney
|
Posted: Wed Nov 16, 2016 10:51 pm Post subject: |
|
|
mecej4 wrote: | The implication is (i) see if you can reduce the use of INTEGER*8 variables in the formatting subroutines, and (ii) please clearly state whether your gFortran results are for 32-bit or 64-bit architecture.
|
My aim was to quickly produce a safe code, so I did not finesse trying to use I*4 for the digits. You are a bit harsh in your comment, as the 32 bit version of my F routine has an increase of 0.551 seconds, while the gFortran F format increases by 5.7 seconds. Given that F and ES formats have been in use for REAL*8 before INTEGER*8 was commonly available, there must be a way, although I*4 only supports 9 digits. Perhaps using I*8 would be an easy fix for FTN95 and gFortran F and ES formats.
The gFortran I am now using is 64-bit Ver 6.1.0. I download it pre-built. Reliable pre-built windows versions are difficult to find, which is a big worry. This keeps FTN95 for my main production code.
As you indicate, Dan's example is easy to read with the existing read_val, after identifying the number fields. There is an easy change required, which is to account for numbers without a ".". Field delimiters can be a space or comma and could easily be extended to include other possibilities such as <HT> or ";:~|", which I have seen from some recording devices. I was surprised that ifort does not support "," as a numeric field terminator for F or I formats, which FTN95 always has and gFortran appear to support. ( flexible number reading is probably too big a topic to bring into this thread !! )
John |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2828 Location: South Pole, Antarctica
|
Posted: Thu Nov 17, 2016 12:24 am Post subject: |
|
|
Mecej4,
Using E format will be slower almost twice according to John's test, so I'd use F with spacing delimiter for a while exactly like in your example. Your current text is almost OK, it already tracks for spaces, the only when iit hiccups is when numbers are too long like in my example above. I don't have time to dig deeper into your code, it is harder to change not own codes, but unfortunately don't have really time for writing my own and experimenting currently, I only barely can check what you guys publish here to see in which direction thread moves.
As an another mentioned direction to investigate, do you know if there exist G-like format in C which keeps the size and amount of digits constant, just moving floating point? I really don't need more then 7-8 leading digits but in future the numbers may grow beyond billion and the code must handle that by switching either to longer F numbers with real8 or use E format. But that's for later exercises. Currently used by my C guys F format is an absurd, it outputs numbers like this.
0.000000458 54321567.846764678
3 useful digits is wsy not enough, 17 is way to many. If Fortran using scientists or engineers would do like that they must be immediately fired without possibility of return. The only excuse - speed.
John, Intel Fortran is not allowing comma as a delimiter? You've made my day. Poor Intel.
Last edited by DanRRight on Thu Nov 17, 2016 1:45 am; edited 1 time in total |
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1892
|
Posted: Thu Nov 17, 2016 1:08 am Post subject: |
|
|
Dan, I recommend that you ask that the C code that produces the data file use 'e' format. There is no problem with printing large or small numbers with that format, as long as the numbers lie between -10^37 and +10^37, and processing lines in your Fortran code will be faster if the field width is uniform. Reading numbers with E format does not have to involve calls to log() -- you do not need to compute the log to 8 or 15 decimal digits when all you need is the integer part of the logarithm.
C also has 'g' format, but that format should only be used when the output is printed on paper to be read by humans.
Next, note that it should not matter that in our toy programs it takes twice as long to process input data with E format compared to F format. That is not a reason to exclude using E format. Your adult program probably does an obscene amount of number crunching; a typical run may be many minutes long, in which case an extra half second spent reading E-format data is negligible.
Finally, I do not understand why you use text files instead of binary files, unformatted Fortran files, or even a memory buffer, for exchanging data between C and Fortran when the file sizes are so large that no human is going to print and read those files. |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2560 Location: Sydney
|
Posted: Thu Nov 17, 2016 1:28 am Post subject: |
|
|
Paul,
I have been looking at Dan's latest free format input and have got side-tracked with a /64 problem. A simple DO loop with a character*1 C temporary varianle is slowing things down dramatically.
If I replace lines 93:95 with lines 96:97 the loop speeds up dramatically from 22.7 seconds to 0.4 seconds, ie
replacing
do k = f,n
c = str(k:k)
if ( c == ' ' ) exit
if ( c == ',' ) exit
with
do k = f,n
if ( str(k:k) == ' ' ) exit
if ( str(k:k) == ',' ) exit
I compile in PLATO with FTN95 Ver 8.05.0 dated 17/06/2016 and salflibc64.dll is 5/11/2016
BUILDLOG is:
TN95.EXE "C:\temp\forum\format\paul_c.f90" /64 /NO_BANNER /VS7 /DELETE_OBJ_ON_ERROR /ERROR_NUMBERS /UNLIMITED_ERRORS /LINK
ftn95 paul_c.f90 /64 /link also demonstrates
Code: | program IntlRead
implicit none
integer, parameter :: lines = 1000000
integer, parameter :: step = lines / 10
character(len=180) :: str
integer :: j,n, ks, ke
real :: del_sec, sec
external del_sec
!
! Initialise string
sec = del_sec (0)
str = '0.000777203, -0.000032224, -0.000039659, -0.365450860, -1.011079630, -0.520287431, -0.285002098, 1355.429879314, 1, 2'
!
! F function read
write (*,*) ' '
write (*,*) 'Test get_next_field'
do j=1,lines
ke = -1
do n=1,10
ks = ke+2
call get_next_field_fast ( str,ks,ke )
end do
if (mod(j,step) == 0) write(*,*) del_sec (-1), j
end do
sec = del_sec (0)
write (*,*) sec,' seconds : get_next_field_fast'
!
! F function read
write (*,*) ' '
write (*,*) 'Test get_next_field_slow'
do j=1,lines
ke = -1
do n=1,10
ks = ke+2
call get_next_field_slow ( str,ks,ke )
end do
if (mod(j,step) == 0) write(*,*) del_sec (-1), j
end do
sec = del_sec (0)
write (*,*) sec,' seconds : get_next_field_slow '
end program
subroutine get_next_field_fast ( str,ks,ke )
character str*(*)
integer*4 :: ks ! start of next field
integer*4 :: ke ! returned end of next field
!
integer*4 n, k, f
!
! determine if valid field
n = len (str)
if ( ks < 1 .or. ks > n ) then
ke = -1
return
end if
!
! find start of number ( ignore leading blanks
do f = ks,n
if ( str(f:f) /= ' ' ) exit
end do
!
! find end of number
do k = f,n
if ( str(k:k) == ' ' ) exit
if ( str(k:k) == ',' ) exit
end do
ke = k-1
end subroutine get_next_field_fast
subroutine get_next_field_slow ( str,ks,ke )
character str*(*)
integer*4 :: ks ! start of next field
integer*4 :: ke ! returned end of next field
!
integer*4 n, k, f
character c*1
!
! determine if valid field
n = len (str)
if ( ks < 1 .or. ks > n ) then
ke = -1
return
end if
!
! find start of number ( ignore leading blanks
do f = ks,n
if ( str(f:f) /= ' ' ) exit
end do
!
! find end of number
do k = f,n
c = str(k:k)
if ( c == ' ' ) exit
if ( c == ',' ) exit
! if ( str(k:k) == ' ' ) exit
! if ( str(k:k) == ',' ) exit
!gen if ( index ( ' ,~:;', str(k:k) ) > 0 ) exit
end do
ke = k-1
end subroutine get_next_field_slow
|
|
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2560 Location: Sydney
|
Posted: Thu Nov 17, 2016 1:29 am Post subject: |
|
|
ctd
Code: | real*4 function del_sec (update)
!
integer*4 :: update
integer*8 :: last_tick = 0
integer*8 :: tick, rate
real*4 :: dt
!
call system_clock ( tick, rate )
dt = real(tick-last_tick) / real(rate)
if ( update >= 0 ) last_tick = tick
del_sec = dt
end function del_sec
|
|
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2828 Location: South Pole, Antarctica
|
Posted: Thu Nov 17, 2016 1:31 am Post subject: |
|
|
Mecej4, Well, really output is in better then that for potentially read speed (since decompression processing is parallelized plus smaller file size) HDF5 format, data got extracted from HDF5 as text just for my reading. I have not yet adopted HDF5 with Fortran. Text is used because there are still very rare bugs, C is unstable beast, and they also happen in output messing debugging like hell because it was initially hard to catch them. ASCII allowed us to catch these NaNs and other garbage but this still happening. I want you guys to check unformatted read too but hope that there still exists speedup potential for formatted read because currently we use just 1℅ of I/O bandwidth
Last edited by DanRRight on Thu Nov 17, 2016 3:52 am; edited 6 times in total |
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1892
|
Posted: Thu Nov 17, 2016 3:42 am Post subject: Re: |
|
|
JohnCampbell wrote: | A simple DO loop with a character*1 C temporary variable is slowing things down dramatically.
If I replace lines 93:95 with lines 96:97 the loop speeds up dramatically from 22.7 seconds to 0.4 seconds. |
John, this is because FTN95 /64 translates
Code: | if ( c == ',' ) exit |
to an expensive function call with four arguments
Code: | call ccomp(c, ',', 1, 1) |
where as the 32-bit compiler simply does
even without /opt. Just setting up the call, the stack frame in ccomp() and then returning would take about a dozen instructions in the 64-bit version. I do not know in which DLL the actual code of ccomp() is located, but I guess that it is a full-fledged string comparison function. |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2560 Location: Sydney
|
Posted: Thu Nov 17, 2016 8:16 am Post subject: |
|
|
mecej4,
Why is it that FTN95 /64 does the change for "if ( c == ' ' ) exit" but not for "if ( str(k:k) == ' ' ) exit" ?
The performance change in comparison to what else is being done is very dramatic.
Also you stated "Reading numbers with E format does not have to involve calls to log() -- you do not need to compute the log to 8 or 15 decimal digits when all you need is the integer part of the logarithm." I actually use LOG10 twice, which was a quick fix, so I'd like to know your recommended alternative.
John |
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1892
|
Posted: Thu Nov 17, 2016 9:12 am Post subject: Re: |
|
|
JohnCampbell wrote: | Why is it that FTN95 /64 does the change for "if ( c == ' ' ) exit" but not for "if ( str(k:k) == ' ' ) exit" ? |
I can only guess. Input source code has many places where an opportunity for optimization exists, and this is one. FTN95/64 is in its infancy, and makes no claims to optimization. I can understand the compiler creators' saying "let's first get it working right, and later work on optimizing" -- I mean the compiled code, not the compiler itself.
Take the logical expression c == ','. The two expressions being compared are character expressions. In general, they may be of different lengths, so comparison may involve truncation, blank-padding, etc., -- whatever is needed to do what the standard specifies. In this specific case, it is easy to see that the length of the RHS is 1. To find the length of the LHS, however, the symbol table has to be looked up to see that it is also 1. Thus, unless the compiler realizes that both sides have lengths equal to 1, it is reasonable to call a general RTL routine that compares two strings of different lengths.
Similar considerations apply to strings of byte lengths 2, 4 and, for FTN95/64, 8. These sizes fit into a register, so entire registers can be compared instead of doing a byte-by-byte comparison.
Quote: | Also you stated "Reading numbers with E format does not have to involve calls to log() -- you do not need to compute the log to 8 or 15 decimal digits when all you need is the integer part of the logarithm." I actually use LOG10 twice, which was a quick fix, so I'd like to know your recommended alternative. |
Notice that in your code, after taking the log, you chop off the fractional part? So, simply count the decimal digits as you are scanning the input string segment for a decimal point, 'E' or end of segment. That count is the value of int(log10(x)). Similarly, for small numbers, you count the number of zeros between the decimal point and the first significant digit after it. If an exponent E[s]nnn is present, you scan that and extract the nnn, and use that to adjust the previously computed integer part of the log10.
Perhaps this exploration should not be carried too far. As we add code to handle more variations in the types of input strings and trap errors in the input data, we move closer and closer to writing a sscanf() routine. We are not embarked on a rewriting of the FTN95/64 I/O RTL, are we? We can be more useful by alerting the Silverfrost team to bugs and areas of poor performance. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|