forums.silverfrost.com

mecej4 · Joined: 31 Oct 2006 Posts: 1892

A round of applause to John for writing the last chapter of the story and for writing the ES output routines.

As it has evolved over a week, the content of this thread is now showing some sprawl. It covers two sets of tests, both of which make a case for reviewing the I/O routines of FTN95 and Gfortran on Windows. The older tests required resources that may not be readily available to everyone (over 100 GB of HDD/SSD space) and substantial run times. The new tests are much more accessible to anyone, and make the case more eloquently.

Some cleaning up, collecting the new tests (READ and WRITE, using built-in formatting/custom routines) into a compact Zip file posted at, say, Dropbox, would give Paul something easier to work with. I offer to run the cleaned up tests (John, please avoid using any FTN95-specific routines in the test codes and make up a Zip file of the source files) with Intel Fortran and Gfortran on Windows and Linux on an older dual boot desktop.

JohnCampbell · Joined: 16 Feb 2006 Posts: 2560 Location: Sydney

mecej4,

I shall assemble the tests and provide a dropbox link.

There are actually 3 sets of tests that I have been demonstrating.

1) Binary_IO test that demonstrated the performance being achieved with fixed length record random access files. This test uses 3 libraries; 2 are based on Fortran fixed length record random access files, while the 3rd used FTN95's new long 64-bit address file routines that I have used for the first time. This code is non-standard FTN95 code, although BinLib was an attempt at conforming Fortran. I shall provide this library.
These show the benefit of both SSD drives but probably more significantly the effect of memory cache buffering. I typically use these libraries to write then read information, so cacheing is important. Transfer rates in excess of 1 gb/sec were being demonstrated even using cached HDD, which is fairly good, especially for 32-bit solutions.

2) Text I/O with large files. This test tries to test file I/O which is non-buffered, for files up to 18 gb in size and 130 gb in total. This is what might be experienced when reading large .csv data sets supplied from an external source. This test is closest to Dan's identified problem.
Key outcome was that reading was fairly fast but write numbers is slow. This is portable code, except for IOMSG=message, which is not supported by FTN95.

3) Internal buffer read/write. This test identifies the slow write number and read number speeds for FTN95 /64 and also gFortran, both of which need improving. FTN95 /32 has relatively good performance. This test has not combined with file text string I/O, but hopefully test 2 confirms this is not a problem. Hopefully this is portable code.

In terms of identifying a problem that needs improving; test 3) is most useful. I shall prepare a link tomorrow (my time)

I am interested to see what performance can be achieved with other compilers. The other tests would be interesting, although they are dependent on many variables, such as disk type, installed memory, processor and I suspect operating system.

John

mecej4 · Joined: 31 Oct 2006 Posts: 1892

I ran your WRITE tests using gFortran, and these are some comments related to that.

Your formatting routines write_val_r4() and write_val_e4() use INTEGER*8 variables. (write_val_e4() is not used in the test runs, but involves more use of INTEGER*8.) As a result, 32-bit EXEs produced by gFortran run significantly slower. Here is a comparison (on my laptop, i7-2720, W10).

lgf 7.7 (32) 52.3 0.806 -o3 -32 -wpo -sse
lgf 7.7 (64) 46.6 0.255 -o3 -64 -wpo -sse

Lahey/GNU/Marlette Fortran is a repackaging of gFortran 5.4.1. I have Cygwin versions of gFortran that are older, and they give similar results.

The implication is (i) see if you can reduce the use of INTEGER*8 variables in the formatting subroutines, and (ii) please clearly state whether your gFortran results are for 32-bit or 64-bit architecture.

I am pondering whether I should sound out an optimization expert whom I happen to know about whether he could look at this performance bug. He has written often about building GCC/GFortran on both Windows and Linux.

Another possibility is to post just some results to C.L.F. with a link to the source codes, and see if someone from the GFortran developers group gets interested.

First, however, we need distributable test codes and some Linux results.

DanRRight · Posted: Wed Nov 16, 2016 2:44 pm Post subject:

John,
So you have increased slow WRITE speed of gFortran by 100+ times and even made it 5x faster then FTN95, wow! Please keep going with READ, including arbitrary F format length. The numbers I read are made by C guys, and may look like this with up to 20 digits long and may even not place decimal point like here

0.000777203 -0.000032224 -0.000039659 -0.365450860 -1.011079630 -0.520287431 -0.285002098 1355.429879314 1

which is absurd but this is usual life of C programmers.

Mecej4, if it will take you not more then few minutes can you please modify your real*4 code to read numbers like this 1355.429879314 ignoring extra digits? My brain can not multitask right now Smile

And see this damn integer at the end of line in previous paragraph? I will try to handle it myself

mecej4 · Joined: 31 Oct 2006 Posts: 1892

It would be easy to modify the code to read the string that you showed. The field widths are not uniform, perhaps not known in advance, and may even vary from record to record. All you have to do is to scan the string for white space, and build an index of where each number begins and ends, and a count of the number of fields.

What you must do before writing code to do this, however, is this: write down (or obtain from your C programmers) a specification of what is allowed (and, therefore, may be expected) in all the text lines that your program will have to read. For example: will some input lines have more or less than the nine fields in your sample record? Will some lines have comments following the numbers? Will some of the numbers have exponents specified?

Without such a specification, you will have a program that will work fine on this specific input line, but may malfunction (without crashing or any sign of error) with other input lines that are superficially similar to this one. And, as someone famous said, "premature optimization is the root of all evil".

Another simple possibility is to have the C program that writes the strings use a repeated %17.9e or some such format, instead of the %.9f format that it is now using.

JohnCampbell · Joined: 16 Feb 2006 Posts: 2560 Location: Sydney

DanRRight · Posted: Thu Nov 17, 2016 12:24 am Post subject:

Mecej4,
Using E format will be slower almost twice according to John's test, so I'd use F with spacing delimiter for a while exactly like in your example. Your current text is almost OK, it already tracks for spaces, the only when iit hiccups is when numbers are too long like in my example above. I don't have time to dig deeper into your code, it is harder to change not own codes, but unfortunately don't have really time for writing my own and experimenting currently, I only barely can check what you guys publish here to see in which direction thread moves.

As an another mentioned direction to investigate, do you know if there exist G-like format in C which keeps the size and amount of digits constant, just moving floating point? I really don't need more then 7-8 leading digits but in future the numbers may grow beyond billion and the code must handle that by switching either to longer F numbers with real8 or use E format. But that's for later exercises. Currently used by my C guys F format is an absurd, it outputs numbers like this.

0.000000458 54321567.846764678

3 useful digits is wsy not enough, 17 is way to many. If Fortran using scientists or engineers would do like that they must be immediately fired without possibility of return. The only excuse - speed.

John, Intel Fortran is not allowing comma as a delimiter? You've made my day. Poor Intel.

mecej4 · Joined: 31 Oct 2006 Posts: 1892

Dan, I recommend that you ask that the C code that produces the data file use 'e' format. There is no problem with printing large or small numbers with that format, as long as the numbers lie between -10^37 and +10^37, and processing lines in your Fortran code will be faster if the field width is uniform. Reading numbers with E format does not have to involve calls to log() -- you do not need to compute the log to 8 or 15 decimal digits when all you need is the integer part of the logarithm.

C also has 'g' format, but that format should only be used when the output is printed on paper to be read by humans.

Next, note that it should not matter that in our toy programs it takes twice as long to process input data with E format compared to F format. That is not a reason to exclude using E format. Your adult program probably does an obscene amount of number crunching; a typical run may be many minutes long, in which case an extra half second spent reading E-format data is negligible.

Finally, I do not understand why you use text files instead of binary files, unformatted Fortran files, or even a memory buffer, for exchanging data between C and Fortran when the file sizes are so large that no human is going to print and read those files.

JohnCampbell · Joined: 16 Feb 2006 Posts: 2560 Location: Sydney

Paul,

I have been looking at Dan's latest free format input and have got side-tracked with a /64 problem. A simple DO loop with a character*1 C temporary varianle is slowing things down dramatically.

If I replace lines 93:95 with lines 96:97 the loop speeds up dramatically from 22.7 seconds to 0.4 seconds, ie
replacing
do k = f,n
c = str(k:k)
if ( c == ' ' ) exit
if ( c == ',' ) exit
with
do k = f,n
if ( str(k:k) == ' ' ) exit
if ( str(k:k) == ',' ) exit

I compile in PLATO with FTN95 Ver 8.05.0 dated 17/06/2016 and salflibc64.dll is 5/11/2016

BUILDLOG is:
TN95.EXE "C:\temp\forum\format\paul_c.f90" /64 /NO_BANNER /VS7 /DELETE_OBJ_ON_ERROR /ERROR_NUMBERS /UNLIMITED_ERRORS /LINK

ftn95 paul_c.f90 /64 /link also demonstrates

JohnCampbell · Joined: 16 Feb 2006 Posts: 2560 Location: Sydney

ctd

DanRRight · Posted: Thu Nov 17, 2016 1:31 am Post subject:

Mecej4, Well, really output is in better then that for potentially read speed (since decompression processing is parallelized plus smaller file size) HDF5 format, data got extracted from HDF5 as text just for my reading. I have not yet adopted HDF5 with Fortran. Text is used because there are still very rare bugs, C is unstable beast, and they also happen in output messing debugging like hell because it was initially hard to catch them. ASCII allowed us to catch these NaNs and other garbage but this still happening. I want you guys to check unformatted read too but hope that there still exists speedup potential for formatted read because currently we use just 1℅ of I/O bandwidth

mecej4 · Joined: 31 Oct 2006 Posts: 1892

JohnCampbell · Joined: 16 Feb 2006 Posts: 2560 Location: Sydney

mecej4,

Why is it that FTN95 /64 does the change for "if ( c == ' ' ) exit" but not for "if ( str(k:k) == ' ' ) exit" ?
The performance change in comparison to what else is being done is very dramatic.

Also you stated "Reading numbers with E format does not have to involve calls to log() -- you do not need to compute the log to 8 or 15 decimal digits when all you need is the integer part of the logarithm." I actually use LOG10 twice, which was a quick fix, so I'd like to know your recommended alternative.

John

mecej4 · Joined: 31 Oct 2006 Posts: 1892