forums.silverfrost.com

DanRRight · Posted: Thu Nov 17, 2016 1:54 pm Post subject:

Well, unformatted read is probably the only good solution until someone will break the sound barrier in speed with formatted read of text files.

i will probably meantime transform all the files which have no problems in them into binary and then read unformatted way. This of course will take time for conversion (will be done in batch regime or overnight) but this will be compensated by 10x speedup when loading them.

mecej4 · Joined: 31 Oct 2006 Posts: 1892

Do note that FTN95 uses a different convention for record markers for Fortran unformatted files than most other current Fortran compilers, and your C programmers may be unfamiliar with Fortran unformatted files. You can search for messages in the Silverfrost forums about conversion from the common unformatted file format to the Silverfrost format.

Not a big issue, but something to be aware of.

DanRRight · Posted: Thu Nov 17, 2016 7:49 pm Post subject:

Got so far 3.5x speed increase on unformatted read with real code versus 10-15x on simple test benches. All that versus my older formatted *-format specifier read. This is because besides loading there are a lot of processing with LOG, EXP and large arrays in my code go simultaneously...Binary files are 3x more compact by the way then my older text files

And tests show that when 64 will be finally ready the read speed will be additionally 1.8x faster

Can you guys please do independent check for unformatted read speed ? We have chance to break 1GB/second on RAMdrives

JohnCampbell · Joined: 16 Feb 2006 Posts: 2560 Location: Sydney

dan,

DanRRight · Posted: Fri Nov 18, 2016 1:57 pm Post subject:

John,

PCI-express SSDs have 3GB/s, we have to utilize this bandwidth

I am so sick and furious with slow formatted text read that i want to try anything. Unformatted read of binary data is of course very dangerous but hopefully we passed the learning curve when it was very hard to catch the error in huge data sets.

Right now we have to handle 10-100GB size files for one run. Coming soon 64-bit compiler will increase the loaded volumes by order of magnitude.

mecej4 · Joined: 31 Oct 2006 Posts: 1892

John,

Here is Fortran code for a fast but approximate log10 function. You may try replacing the two calls to log10() in your E format output routine by calls to qlg10(). It computes an 8-bit approximation to log10(x), which is slightly better than two decimal digits. In the first call, a small table is set up. Each subsequent evaluation of qlg10() involves a single floating-multiply-add (FMA) operation and some integer shift and AND operations.

mecej4 · Joined: 31 Oct 2006 Posts: 1892

After some fine tuning of the E and F output formatting subroutines, I obtained the following timing results (W10, i7-2720QM).

DanRRight · Posted: Sun Nov 20, 2016 4:12 pm Post subject:

Thanks mecej4, such wide comparisons tell a lot.

Was this for WRITE? How about READ?

And for completeness would be good to add also * format which is very good when there are different type variables of different widths in the line

JohnCampbell · Joined: 16 Feb 2006 Posts: 2560 Location: Sydney

mecej4's summary of write performance gives a good indication of Fortran formatted write performance, especially,
FTN95 /32 is comparatively good for formatted write
FTN95 /64 is slow for Format Write
gFortran is incredibly slow for Format write.

The function approach is faster and may provide a use if there are specific changes to output, say for 0. or -0.000, or if using gFortran !

regarding READ performance, I have reviewed the internal read options, using a variation of mecej4's post on 15 Nov and Dan's free format layout of 16 Nov.

The test options I considered are:
1 read fixed format using 10F10.3
2 read fixed format using mecej4's read_val routine
3 read fixed format using read_val routine for optional decimal
4 read free format layout, using Dan's layout example, with numbers separated by a space or comma
5 parse string only, using Dan's layout example, with numbers separated by a space or comma

I have tested this for FTN95 /32, FTN95 /64 and gFortran 64-bit.

DanRRight · Posted: Mon Nov 21, 2016 8:56 am Post subject:

Thanks, John, that comparison was great. You both with mecej4 have made very nice insight into the Fortran I/O, made huge step forward in speeding it and pointed out the further areas of improvement !

mecej4 · Joined: 31 Oct 2006 Posts: 1892

John: There is one improvement that can be made to your code for writing E-formatted numbers; it may yield a slight but noticeable improvement in speed.

After one has found log10(x), rounded and scaled x, the integer v expressed in the decimal scale is exactly m digits long. Therefore, the two IF tests in the DO loop following "working with" are uncecessary. The loop can be replaced by

JohnCampbell · Joined: 16 Feb 2006 Posts: 2560 Location: Sydney

mecej4,

Thanks for the further advice. I have modified the loops to remove some of the unnecessary instructions you indicated and have produced some significant improvements. The following table tracks the elapsed time performance of the ES Function for changes that have been discussed, including:
removing a LOG10 function then replacing the other with a quick LOG10 look-up function
cleaning the DO loops to remove unnecessary instructions

mecej4 · Joined: 31 Oct 2006 Posts: 1892

A couple of observations.

1. In many borderline cases, your new routine is faithful to the "round to nearest or even" rule; the previous version does not always get the least significant digit right.

2. When the input number causes the "patch for round up power" to be applied, the result is not correct. Try, for example,

JohnCampbell · Joined: 16 Feb 2006 Posts: 2560 Location: Sydney

mecej4,

How do you find these numbers !

The error is due to the error with QLG10 estimating the wrong integer power. ( I was hoping the lookup table would have been correct at multiples of 10 )
I corrected the problem by making a more general response to the integer power "ip" being wrong. (could have an infinite loop with rounding ?)

John

mecej4 · Joined: 31 Oct 2006 Posts: 1892

You can reduce the number of instances (where a correction is needed) by increasing n to, say, 12, and TBLSIZ to 2^12 = 4096. You could, in addition, revert to using the full-precision log10() function in the few instances in which you detect that the estimated exponent needs correction.

Here is an improved version of QLG10 in which, after the table lookup, instead of taking the closest lower value, linear interpolation is applied to the two bracketing values. With this change, I find that 556 exponent updates were needed for 10 million random numbers between -1E5 to 1E5. If the full precision log10() is used, the number of exponent updates becomes 548. Thus, your exponent checks and updates are needed to be present in the code (even when you use log10()), but the performance hit is minor because such updates may be rarely invoked.