forums.silverfrost.com

Kenneth_Smith

By strange coincidence, I too have been surprised with the poor performance writing to binary direct access files. I've not used this approach before, but I was aware that sometimes my programming style does not led itself to speed - lots of writing and then reading the data by different program units - so I thought I look at this some wet winter evening.

On my machine the code below, takes 47 ms to allocate and populate the R4 1000x1000 array A, 17.2 s to write it to the direct access file, and 296 ms to recover same the data into array B.

Is this typical performance for writing, or am I doing something silly here?

Cheers Ken

davidb · Joined: 17 Jul 2009 Posts: 560 Location: UK

For me with your example, I get 1.6 seconds for the write section and 0.16 for the read section. So there is certainly a difference.
_________________
Programmer in: Fortran 77/95/2003/2008, C, C++ (& OpenMP), java, Python, Perl

Kenneth_Smith · Posted: Mon Jan 26, 2015 9:58 pm Post subject:

Interesting David, with that write speed I would not have raised the query!
I realise it will be machine/hardware dependant, but it's intriguing to see how much difference there can be. Out of curiosity I've bug out the old machines I have sitting at home, listed below in ascending order of increasing performance spec, along with their windows version and the time to perform the write operation:-

JohnCampbell · Joined: 16 Feb 2006 Posts: 2554 Location: Sydney

Ken,

I have not seen "INQUIRE(IOLENGTH=RECLEN)A(1,1)" used in this way before. Is this legal ?

I tried to improve the performance, by reducing the number of records from 1,000,000 to 1,000. This does have a significant improvement in speed, even though your original code wrote the records out sequentially. I would have expected better buffering.

When I changed this, "INQUIRE(IOLENGTH=RECLEN)A(:,1)" did not work, so a different approach is needed for this. The intrinsic SIZEOF would be a great addition to FTN95!

I would also have expected an improvement with Windows 7xx over XP, but you are not finding this. Perhaps the improvement is more noticeable with larger files.
Anyway, the problem is 1,000,0000 records of 4 bytes is much slower than 1,000 records of 4,000 bytes. Reading is faster, as it is being buffered. Note that clock@ is only accurate to 1/64 second, so the read from buffer can be quicker than this. Hence 0 seconds for read. Use system_clock for better accuracy.

My modified code to see the change is:

davidb · Joined: 17 Jul 2009 Posts: 560 Location: UK

davidb · Joined: 17 Jul 2009 Posts: 560 Location: UK

JohnCampbell · Joined: 16 Feb 2006 Posts: 2554 Location: Sydney

INQUIRE (IOLENGTH=RECLEN) A(:,1) and
INQUIRE (IOLENGTH=RECLEN) A(1:NROW,1)
both produces a compiler error or stack overflow.

The following is F95 standard conforming for SIZEOF

reclen = size ( transfer ( a(1,1), (/'A'/) ) )
if ( big_blocks) reclen = reclen*size (A(:,1))
write(6,*)'RECL=',reclen

John

Kenneth_Smith · Posted: Mon Jan 26, 2015 11:40 pm Post subject:

Thanks John, that is super fast!

As I understand you code B(:,J) is writing a whole column of data as a single record - I must admit that construction does not jump out to me as the obvious way to do things - guess I still think too much in the F77 world!

I did try writing rows of data via an implied do loop - without much success and with hindsight that would still be the same number of writes and my serial approach. I did think that, one possible advantage of the serial approach might be that there is no need to recover the whole array into memory, simply access the required element i,j when required by it's record number, but clearly it may well be easier to recover the whole column that contains i,j.

I see I have much to learn, i would have written

JohnCampbell · Joined: 16 Feb 2006 Posts: 2554 Location: Sydney

Ken,

I was just looking at IOLENGTH and noticed a problem with your example.
The following code is in error :

John-Silver · Joined: 30 Jul 2013 Posts: 1520 Location: Aerospace Valley

reading this post I observe a typical non-anglo-saxon useage of the word INQUIRE.
Maybe all programmers should be forced to read the Oxford ENGLISH Dictionary by heart instead of that woman's Wink

version over the pond.
....

wahorger · Joined: 13 Oct 2014 Posts: 1217 Location: Morrison, CO, USA

What I found a long time ago (http://forums.silverfrost.com/viewtopic.php?t=2992) was the record length has an effect, but it's really the total number of I/O calls that determines the performance. Short record, lots of I/O, poor performance as compared to doing the identical kinds of I/O in "C" (but without the flexibility [and overhead] that FTN95 I/O lists gives).

I have abandoned almost all of the direct access file I/O in my software because the performance is so poor. And, for still unknown reasons. Running my benchmarks (written long ago also) on a solid state drive that is blindingly fast still shows the lack of performance discovered a couple of years ago. Running the same benchmark on a RAM disk also showed the same poor performance (albeit less poor than on a hard drive, but the percentages still held true).

For me, that which used to be done in a direct access temporary file is now done in memory. Which limits the size of the data sets that can be processed. Not good, right?

Still modifying code to remove the file limitations of direct access and placing all the files in memory......

JohnCampbell · Joined: 16 Feb 2006 Posts: 2554 Location: Sydney

Bill,

I read the link you posted. I must admit I don't use Win 8.1 except for 1 program on that PC.
I run most analysis runs on Win 7 desktops as single user, ie no change to SHARE. Apart from your test example from Jan-15, my experience of Direct Access has always been of very good performance and I would always recommend that approach for substantial disk I/O. Did you come up with any fixes to the problem ?

Perhaps the combination of file buffers and some SHARE options is a problem, but reading the post from 18 months ago that may have been dismissed as a major cause. The other problem could have been the virus checker, as it can clash with I/O performance. Perhaps you should split the files, with a small token file with SHARE status, while the large database files do not require share write to work. (Share read should not be a problem, although I don't remember where the previous thread finished up)

My experience is direct access works very well and combines well with the file cache/buffers. All my programs that use it perform well for disk I/O. I would always recommend this approach.

John

wahorger · Joined: 13 Oct 2014 Posts: 1217 Location: Morrison, CO, USA

John, it works well, but..... The performance hit is too great to ignore. And, it was the same hit whether running the same benchmark under Windows 8 or Windows 2000.

I think, when I get some time, I'll repeat the benchmarking, this time linking in an equivalent "C" implementation and getting the number for that to help in the comparisons. The source of the performance penalty, should we be able to find it, would be of interest to many, I think.

Just FYI, under Windows 3.1 up through XP, I used a different FORTRAN compiler to create the operational code. None of these effects were seen. The same code also harkens back to CP/M days using an early Microsoft compiler. The same FORTRAN code was ported to a VAX and a PDP-11. In all these cases, there was not a performance hit due to the direct-access unformatted I/O.

I may have a platform that can support an old compiler to compile and run the same benchmark. I'll look at adding that to the mix. That would at least eliminate the OS as a culprit.