forums.silverfrost.com Forum Index forums.silverfrost.com
Welcome to the Silverfrost forums
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Database or NetCDF / HDF5 interfaces?
Goto page Previous  1, 2, 3, 4, 5, 6  Next
 
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> General
View previous topic :: View next topic  
Author Message
JohnCampbell



Joined: 16 Feb 2006
Posts: 1709
Location: Sydney

PostPosted: Wed Nov 16, 2016 10:46 am    Post subject: Reply with quote

ctd
Code:
   subroutine write_val_e4 (val, str, n)
!
!  writes -3.04E+01
!
     real*4    :: val         !  value to write; must fit
     integer*4 :: n           !  digits >= 0 and < len(str)
     character :: str*(*)
!
     integer*4 :: l           ! len ( str )
     real*8    :: rv          ! abs ( val )
     real*8    :: power       ! log10 ( val )
     real*8    :: round       ! round-off
     integer*4 :: ip          ! E+ip
     integer*8 :: v           ! integer for digits of val
     integer*8 :: ten = 10    ! mod
     integer*4 :: k           ! position of digit
     integer*4 :: p           ! position of '.'
     integer*4 :: sgn         ! +/-
     integer*4 :: d           ! digit
     integer*4 :: z = ichar ('0')
     integer*4 :: i,m
!
!  Remove sign
     if ( val > 0 ) then
       sgn = 1
       rv  = val
     else if ( val < 0 ) then
       sgn = -1
       rv  = -val
     else
       str = ' 0.'
       return
     end if
!
!  Check for overflow
     m   = max (n,1)                !  digits to provide
     l   = len (str)
     p   = 3
     if ( m > l-7 ) goto 99
     str = ' '
!
!  Determine power
     power = log10 (rv)  ; ip = power  ; if ( power < 0) ip = (power - .9999999d0)
     round = 0.5*10.0d0**(ip-m)
     rv    = rv + round
!zz       write (*,*) 'rv changed from',abs(val),' to',rv, ' with',round, ip
!  check for 9.99999 + .00005       
     power = log10 (rv)  ; ip = power  ; if ( power < 0) ip = (power - .9999999d0)
     rv    = rv * 10.0d0**(m-ip)
!
!  generate digits
     str(p:p) = '.'
     k = 3+m                        ! last digit position
     v = rv                         ! digits
!zz       write (*,*) 'working with',v,ip
     do
       if ( k==p ) k = k-1
       d = mod(v,ten)
       if ( k < 1 ) goto 99
       str(k:k) = char (d+z)
       v = v/10
       k = k-1
       if ( v == 0 .and. k < p ) exit
     end do
!
!  -ve values
     if ( sgn < 0 ) then
       if ( k < 1 ) goto 99
       str(k:k) = '-'
     end if 
!
!  write power
     k = 3+m+2           !
     if ( ip < 0 ) then
       v = abs(ip)
       str(k-1:k) = 'E-'
     else
       v = ip
       str(k-1:k) = 'E+'
     end if
!
     m = 2
     if ( v > 99 ) m = 3
     k = k+m
     do i = 1,m
       d = mod(v,ten)
       str(k:k) = char (d+z)
       v = v/10
       k = k-1
     end do
     return
!
!  overflow field
 99  str = repeat ('#', l)
     return

   end subroutine write_val_e4


{batch file to test}
del %1.exe
ftn95 %1 /opt /link
%1

del %1.exe
ftn95 %1 /64 /link
%1

del %1.exe
gfortran %1.f90 -O2 -o %1.exe
%1


Paul,

Is it possible to review the FTN95 /64 performance for formatted write ?

John
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 621

PostPosted: Wed Nov 16, 2016 11:45 am    Post subject: Reply with quote

A round of applause to John for writing the last chapter of the story and for writing the ES output routines.

As it has evolved over a week, the content of this thread is now showing some sprawl. It covers two sets of tests, both of which make a case for reviewing the I/O routines of FTN95 and Gfortran on Windows. The older tests required resources that may not be readily available to everyone (over 100 GB of HDD/SSD space) and substantial run times. The new tests are much more accessible to anyone, and make the case more eloquently.

Some cleaning up, collecting the new tests (READ and WRITE, using built-in formatting/custom routines) into a compact Zip file posted at, say, Dropbox, would give Paul something easier to work with. I offer to run the cleaned up tests (John, please avoid using any FTN95-specific routines in the test codes and make up a Zip file of the source files) with Intel Fortran and Gfortran on Windows and Linux on an older dual boot desktop.
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 1709
Location: Sydney

PostPosted: Wed Nov 16, 2016 12:40 pm    Post subject: Reply with quote

mecej4,

I shall assemble the tests and provide a dropbox link.

There are actually 3 sets of tests that I have been demonstrating.

1) Binary_IO test that demonstrated the performance being achieved with fixed length record random access files. This test uses 3 libraries; 2 are based on Fortran fixed length record random access files, while the 3rd used FTN95's new long 64-bit address file routines that I have used for the first time. This code is non-standard FTN95 code, although BinLib was an attempt at conforming Fortran. I shall provide this library.
These show the benefit of both SSD drives but probably more significantly the effect of memory cache buffering. I typically use these libraries to write then read information, so cacheing is important. Transfer rates in excess of 1 gb/sec were being demonstrated even using cached HDD, which is fairly good, especially for 32-bit solutions.

2) Text I/O with large files. This test tries to test file I/O which is non-buffered, for files up to 18 gb in size and 130 gb in total. This is what might be experienced when reading large .csv data sets supplied from an external source. This test is closest to Dan's identified problem.
Key outcome was that reading was fairly fast but write numbers is slow. This is portable code, except for IOMSG=message, which is not supported by FTN95.

3) Internal buffer read/write. This test identifies the slow write number and read number speeds for FTN95 /64 and also gFortran, both of which need improving. FTN95 /32 has relatively good performance. This test has not combined with file text string I/O, but hopefully test 2 confirms this is not a problem. Hopefully this is portable code.

In terms of identifying a problem that needs improving; test 3) is most useful. I shall prepare a link tomorrow (my time)

I am interested to see what performance can be achieved with other compilers. The other tests would be interesting, although they are dependent on many variables, such as disk type, installed memory, processor and I suspect operating system.

John
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 621

PostPosted: Wed Nov 16, 2016 2:26 pm    Post subject: Reply with quote

I ran your WRITE tests using gFortran, and these are some comments related to that.

Your formatting routines write_val_r4() and write_val_e4() use INTEGER*8 variables. (write_val_e4() is not used in the test runs, but involves more use of INTEGER*8.) As a result, 32-bit EXEs produced by gFortran run significantly slower. Here is a comparison (on my laptop, i7-2720, W10).

lgf 7.7 (32) 52.3 0.806 -o3 -32 -wpo -sse
lgf 7.7 (64) 46.6 0.255 -o3 -64 -wpo -sse

Lahey/GNU/Marlette Fortran is a repackaging of gFortran 5.4.1. I have Cygwin versions of gFortran that are older, and they give similar results.

The implication is (i) see if you can reduce the use of INTEGER*8 variables in the formatting subroutines, and (ii) please clearly state whether your gFortran results are for 32-bit or 64-bit architecture.

I am pondering whether I should sound out an optimization expert whom I happen to know about whether he could look at this performance bug. He has written often about building GCC/GFortran on both Windows and Linux.

Another possibility is to post just some results to C.L.F. with a link to the source codes, and see if someone from the GFortran developers group gets interested.

First, however, we need distributable test codes and some Linux results.


Last edited by mecej4 on Wed Nov 16, 2016 2:47 pm; edited 1 time in total
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 1443
Location: South Pole, Antarctica

PostPosted: Wed Nov 16, 2016 2:44 pm    Post subject: Reply with quote

John,
So you have increased slow WRITE speed of gFortran by 100+ times and even made it 5x faster then FTN95, wow! Please keep going with READ, including arbitrary F format length. The numbers I read are made by C guys, and may look like this with up to 20 digits long and may even not place decimal point like here

0.000777203 -0.000032224 -0.000039659 -0.365450860 -1.011079630 -0.520287431 -0.285002098 1355.429879314 1

which is absurd but this is usual life of C programmers.

Mecej4, if it will take you not more then few minutes can you please modify your real*4 code to read numbers like this 1355.429879314 ignoring extra digits? My brain can not multitask right now Smile

And see this damn integer at the end of line in previous paragraph? I will try to handle it myself
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 621

PostPosted: Wed Nov 16, 2016 5:17 pm    Post subject: Reply with quote

It would be easy to modify the code to read the string that you showed. The field widths are not uniform, perhaps not known in advance, and may even vary from record to record. All you have to do is to scan the string for white space, and build an index of where each number begins and ends, and a count of the number of fields.

What you must do before writing code to do this, however, is this: write down (or obtain from your C programmers) a specification of what is allowed (and, therefore, may be expected) in all the text lines that your program will have to read. For example: will some input lines have more or less than the nine fields in your sample record? Will some lines have comments following the numbers? Will some of the numbers have exponents specified?

Without such a specification, you will have a program that will work fine on this specific input line, but may malfunction (without crashing or any sign of error) with other input lines that are superficially similar to this one. And, as someone famous said, "premature optimization is the root of all evil".

Another simple possibility is to have the C program that writes the strings use a repeated %17.9e or some such format, instead of the %.9f format that it is now using.
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 1709
Location: Sydney

PostPosted: Wed Nov 16, 2016 10:51 pm    Post subject: Reply with quote

mecej4 wrote:
The implication is (i) see if you can reduce the use of INTEGER*8 variables in the formatting subroutines, and (ii) please clearly state whether your gFortran results are for 32-bit or 64-bit architecture.

My aim was to quickly produce a safe code, so I did not finesse trying to use I*4 for the digits. You are a bit harsh in your comment, as the 32 bit version of my F routine has an increase of 0.551 seconds, while the gFortran F format increases by 5.7 seconds. Given that F and ES formats have been in use for REAL*8 before INTEGER*8 was commonly available, there must be a way, although I*4 only supports 9 digits. Perhaps using I*8 would be an easy fix for FTN95 and gFortran F and ES formats.

The gFortran I am now using is 64-bit Ver 6.1.0. I download it pre-built. Reliable pre-built windows versions are difficult to find, which is a big worry. This keeps FTN95 for my main production code.

As you indicate, Dan's example is easy to read with the existing read_val, after identifying the number fields. There is an easy change required, which is to account for numbers without a ".". Field delimiters can be a space or comma and could easily be extended to include other possibilities such as <HT> or ";:~|", which I have seen from some recording devices. I was surprised that ifort does not support "," as a numeric field terminator for F or I formats, which FTN95 always has and gFortran appear to support. ( flexible number reading is probably too big a topic to bring into this thread !! )

John
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 1443
Location: South Pole, Antarctica

PostPosted: Thu Nov 17, 2016 12:24 am    Post subject: Reply with quote

Mecej4,
Using E format will be slower almost twice according to John's test, so I'd use F with spacing delimiter for a while exactly like in your example. Your current text is almost OK, it already tracks for spaces, the only when iit hiccups is when numbers are too long like in my example above. I don't have time to dig deeper into your code, it is harder to change not own codes, but unfortunately don't have really time for writing my own and experimenting currently, I only barely can check what you guys publish here to see in which direction thread moves.

As an another mentioned direction to investigate, do you know if there exist G-like format in C which keeps the size and amount of digits constant, just moving floating point? I really don't need more then 7-8 leading digits but in future the numbers may grow beyond billion and the code must handle that by switching either to longer F numbers with real8 or use E format. But that's for later exercises. Currently used by my C guys F format is an absurd, it outputs numbers like this.

0.000000458 54321567.846764678

3 useful digits is wsy not enough, 17 is way to many. If Fortran using scientists or engineers would do like that they must be immediately fired without possibility of return. The only excuse - speed.

John, Intel Fortran is not allowing comma as a delimiter? You've made my day. Poor Intel.


Last edited by DanRRight on Thu Nov 17, 2016 1:45 am; edited 1 time in total
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 621

PostPosted: Thu Nov 17, 2016 1:08 am    Post subject: Reply with quote

Dan, I recommend that you ask that the C code that produces the data file use 'e' format. There is no problem with printing large or small numbers with that format, as long as the numbers lie between -10^37 and +10^37, and processing lines in your Fortran code will be faster if the field width is uniform. Reading numbers with E format does not have to involve calls to log() -- you do not need to compute the log to 8 or 15 decimal digits when all you need is the integer part of the logarithm.

C also has 'g' format, but that format should only be used when the output is printed on paper to be read by humans.

Next, note that it should not matter that in our toy programs it takes twice as long to process input data with E format compared to F format. That is not a reason to exclude using E format. Your adult program probably does an obscene amount of number crunching; a typical run may be many minutes long, in which case an extra half second spent reading E-format data is negligible.

Finally, I do not understand why you use text files instead of binary files, unformatted Fortran files, or even a memory buffer, for exchanging data between C and Fortran when the file sizes are so large that no human is going to print and read those files.
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 1709
Location: Sydney

PostPosted: Thu Nov 17, 2016 1:28 am    Post subject: Reply with quote

Paul,

I have been looking at Dan's latest free format input and have got side-tracked with a /64 problem. A simple DO loop with a character*1 C temporary varianle is slowing things down dramatically.

If I replace lines 93:95 with lines 96:97 the loop speeds up dramatically from 22.7 seconds to 0.4 seconds, ie
replacing
do k = f,n
c = str(k:k)
if ( c == ' ' ) exit
if ( c == ',' ) exit
with
do k = f,n
if ( str(k:k) == ' ' ) exit
if ( str(k:k) == ',' ) exit

I compile in PLATO with FTN95 Ver 8.05.0 dated 17/06/2016 and salflibc64.dll is 5/11/2016

BUILDLOG is:
TN95.EXE "C:\temp\forum\format\paul_c.f90" /64 /NO_BANNER /VS7 /DELETE_OBJ_ON_ERROR /ERROR_NUMBERS /UNLIMITED_ERRORS /LINK

ftn95 paul_c.f90 /64 /link also demonstrates

Code:
  program IntlRead

   implicit none
   integer, parameter :: lines = 1000000
   integer, parameter :: step  = lines / 10
   character(len=180) :: str
   integer :: j,n, ks, ke
   real    :: del_sec, sec
   external   del_sec
!
!  Initialise string
     sec = del_sec (0)
     str = '0.000777203, -0.000032224, -0.000039659, -0.365450860, -1.011079630, -0.520287431, -0.285002098, 1355.429879314, 1, 2'
!
!  F function read
     write (*,*) ' '
     write (*,*) 'Test get_next_field'
     do j=1,lines
        ke = -1
        do n=1,10
           ks = ke+2
           call get_next_field_fast ( str,ks,ke )
        end do
        if (mod(j,step) == 0) write(*,*) del_sec (-1), j
     end do
     sec = del_sec (0)
     write (*,*) sec,' seconds : get_next_field_fast'
!
!  F function read
     write (*,*) ' '
     write (*,*) 'Test get_next_field_slow'
     do j=1,lines
        ke = -1
        do n=1,10
           ks = ke+2
           call get_next_field_slow ( str,ks,ke )
        end do
        if (mod(j,step) == 0) write(*,*) del_sec (-1), j
     end do
     sec = del_sec (0)
     write (*,*) sec,' seconds : get_next_field_slow '
 end program
 
   subroutine get_next_field_fast ( str,ks,ke )
     character str*(*)
     integer*4 :: ks   ! start of next field
     integer*4 :: ke   ! returned end of next field
!
     integer*4 n, k, f
!
!   determine if valid field
     n = len (str)
     if ( ks < 1 .or. ks > n ) then
       ke = -1
       return
     end if
!
!  find start of number ( ignore leading blanks
     do f = ks,n
       if ( str(f:f) /= ' ' ) exit
     end do
!
!  find end of number
     do k = f,n
       if ( str(k:k) == ' ' ) exit
       if ( str(k:k) == ',' ) exit
     end do
     ke = k-1   
   end subroutine get_next_field_fast

   subroutine get_next_field_slow ( str,ks,ke )
     character str*(*)
     integer*4 :: ks   ! start of next field
     integer*4 :: ke   ! returned end of next field
!
     integer*4 n, k, f
     character c*1
!
!   determine if valid field
     n = len (str)
     if ( ks < 1 .or. ks > n ) then
       ke = -1
       return
     end if
!
!  find start of number ( ignore leading blanks
     do f = ks,n
       if ( str(f:f) /= ' ' ) exit
     end do
!
!  find end of number
     do k = f,n
       c = str(k:k)
       if ( c == ' ' ) exit
       if ( c == ',' ) exit
!       if ( str(k:k) == ' ' ) exit
!       if ( str(k:k) == ',' ) exit
!gen       if ( index ( ' ,~:;', str(k:k) ) > 0 ) exit
     end do
     ke = k-1   
   end subroutine get_next_field_slow
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 1709
Location: Sydney

PostPosted: Thu Nov 17, 2016 1:29 am    Post subject: Reply with quote

ctd
Code:
   real*4 function del_sec (update)
!
      integer*4 :: update
      integer*8 :: last_tick = 0
      integer*8 :: tick, rate
      real*4    :: dt
!
      call system_clock ( tick, rate )
      dt = real(tick-last_tick) / real(rate)     
      if ( update >= 0 ) last_tick = tick
      del_sec   = dt
   end function del_sec
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 1443
Location: South Pole, Antarctica

PostPosted: Thu Nov 17, 2016 1:31 am    Post subject: Reply with quote

Mecej4, Well, really output is in better then that for potentially read speed (since decompression processing is parallelized plus smaller file size) HDF5 format, data got extracted from HDF5 as text just for my reading. I have not yet adopted HDF5 with Fortran. Text is used because there are still very rare bugs, C is unstable beast, and they also happen in output messing debugging like hell because it was initially hard to catch them. ASCII allowed us to catch these NaNs and other garbage but this still happening. I want you guys to check unformatted read too but hope that there still exists speedup potential for formatted read because currently we use just 1℅ of I/O bandwidth

Last edited by DanRRight on Thu Nov 17, 2016 3:52 am; edited 6 times in total
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 621

PostPosted: Thu Nov 17, 2016 3:42 am    Post subject: Re: Reply with quote

JohnCampbell wrote:
A simple DO loop with a character*1 C temporary variable is slowing things down dramatically.

If I replace lines 93:95 with lines 96:97 the loop speeds up dramatically from 22.7 seconds to 0.4 seconds.

John, this is because FTN95 /64 translates
Code:
     if ( c == ',' ) exit

to an expensive function call with four arguments
Code:
call ccomp(c, ',', 1, 1)

where as the 32-bit compiler simply does
Code:
cmpb      C,=44

even without /opt. Just setting up the call, the stack frame in ccomp() and then returning would take about a dozen instructions in the 64-bit version. I do not know in which DLL the actual code of ccomp() is located, but I guess that it is a full-fledged string comparison function.
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 1709
Location: Sydney

PostPosted: Thu Nov 17, 2016 8:16 am    Post subject: Reply with quote

mecej4,

Why is it that FTN95 /64 does the change for "if ( c == ' ' ) exit" but not for "if ( str(k:k) == ' ' ) exit" ?
The performance change in comparison to what else is being done is very dramatic.

Also you stated "Reading numbers with E format does not have to involve calls to log() -- you do not need to compute the log to 8 or 15 decimal digits when all you need is the integer part of the logarithm." I actually use LOG10 twice, which was a quick fix, so I'd like to know your recommended alternative.

John
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 621

PostPosted: Thu Nov 17, 2016 9:12 am    Post subject: Re: Reply with quote

JohnCampbell wrote:
Why is it that FTN95 /64 does the change for "if ( c == ' ' ) exit" but not for "if ( str(k:k) == ' ' ) exit" ?

I can only guess. Input source code has many places where an opportunity for optimization exists, and this is one. FTN95/64 is in its infancy, and makes no claims to optimization. I can understand the compiler creators' saying "let's first get it working right, and later work on optimizing" -- I mean the compiled code, not the compiler itself.

Take the logical expression c == ','. The two expressions being compared are character expressions. In general, they may be of different lengths, so comparison may involve truncation, blank-padding, etc., -- whatever is needed to do what the standard specifies. In this specific case, it is easy to see that the length of the RHS is 1. To find the length of the LHS, however, the symbol table has to be looked up to see that it is also 1. Thus, unless the compiler realizes that both sides have lengths equal to 1, it is reasonable to call a general RTL routine that compares two strings of different lengths.

Similar considerations apply to strings of byte lengths 2, 4 and, for FTN95/64, 8. These sizes fit into a register, so entire registers can be compared instead of doing a byte-by-byte comparison.

Quote:
Also you stated "Reading numbers with E format does not have to involve calls to log() -- you do not need to compute the log to 8 or 15 decimal digits when all you need is the integer part of the logarithm." I actually use LOG10 twice, which was a quick fix, so I'd like to know your recommended alternative.

Notice that in your code, after taking the log, you chop off the fractional part? So, simply count the decimal digits as you are scanning the input string segment for a decimal point, 'E' or end of segment. That count is the value of int(log10(x)). Similarly, for small numbers, you count the number of zeros between the decimal point and the first significant digit after it. If an exponent E[s]nnn is present, you scan that and extract the nnn, and use that to adjust the previously computed integer part of the log10.

Perhaps this exploration should not be carried too far. As we add code to handle more variations in the types of input strings and trap errors in the input data, we move closer and closer to writing a sscanf() routine. We are not embarked on a rewriting of the FTN95/64 I/O RTL, are we? We can be more useful by alerting the Silverfrost team to bugs and areas of poor performance.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> General All times are GMT + 1 Hour
Goto page Previous  1, 2, 3, 4, 5, 6  Next
Page 4 of 6

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group