|
forums.silverfrost.com Welcome to the Silverfrost forums
|
View previous topic :: View next topic |
Author |
Message |
Kenneth_Smith
Joined: 18 May 2012 Posts: 697 Location: Hamilton, Lanarkshire, Scotland.
|
Posted: Mon Jan 26, 2015 7:39 pm Post subject: Direct access binary speed - again! |
|
|
By strange coincidence, I too have been surprised with the poor performance writing to binary direct access files. I've not used this approach before, but I was aware that sometimes my programming style does not led itself to speed - lots of writing and then reading the data by different program units - so I thought I look at this some wet winter evening.
On my machine the code below, takes 47 ms to allocate and populate the R4 1000x1000 array A, 17.2 s to write it to the direct access file, and 296 ms to recover same the data into array B.
Is this typical performance for writing, or am I doing something silly here?
Cheers Ken
Code: |
PROGRAM TEST
IMPLICIT NONE
INTEGER RECLEN, NROW, NCOL, I, J, RECORD
REAL, ALLOCATABLE:: A(:,:), B(:,:)
REAL T1,T2,T3,T4
CALL CLOCK@(T1)
NROW = 1000
NCOL = 1000
ALLOCATE(A(NROW,NCOL))
DO J = 1, NCOL, 1
DO I = 1, NROW, 1
A(I,J) = (J-1)*NCOL + I
END DO
END DO
CALL CLOCK@(T2)
WRITE(6,*)'TIME ALLOCATE AND POLULATE ARRAY IN MEMORY', T2-T1
INQUIRE(IOLENGTH=RECLEN)A(1,1)
OPEN(UNIT=10,FILE='data1.bin',FORM='UNFORMATTED',ACCESS='DIRECT',
& RECL=reclen)
DO J = 1, NCOL, 1
DO I = 1, NROW, 1
RECORD = (J-1)*NCOL + I
WRITE(10,REC=RECORD) A(I,J)
END DO
END DO
CALL CLOCK@(T3)
WRITE(6,*)'TIME TO WRITE ARRAY TO DIRECT ACCESS FILE',T3-T2
ALLOCATE(B(NROW,NCOL))
DO J = 1, NCOL, 1
DO I = 1, NROW, 1
RECORD = (J-1)*NCOL + I
READ(10,REC=RECORD) B(I,J)
END DO
END DO
CALL CLOCK@(T4)
WRITE(6,*)'TIME TO READ ARRAY FROM DIRECT ACCESS FILE',T4-T3
CLOSE(UNIT=10,STATUS='DELETE')
END
|
|
|
Back to top |
|
|
davidb
Joined: 17 Jul 2009 Posts: 560 Location: UK
|
Posted: Mon Jan 26, 2015 9:03 pm Post subject: |
|
|
For me with your example, I get 1.6 seconds for the write section and 0.16 for the read section. So there is certainly a difference. _________________ Programmer in: Fortran 77/95/2003/2008, C, C++ (& OpenMP), java, Python, Perl |
|
Back to top |
|
|
Kenneth_Smith
Joined: 18 May 2012 Posts: 697 Location: Hamilton, Lanarkshire, Scotland.
|
Posted: Mon Jan 26, 2015 9:58 pm Post subject: |
|
|
Interesting David, with that write speed I would not have raised the query!
I realise it will be machine/hardware dependant, but it's intriguing to see how much difference there can be. Out of curiosity I've bug out the old machines I have sitting at home, listed below in ascending order of increasing performance spec, along with their windows version and the time to perform the write operation:-
Code: |
Windows XP 7.2s
Windows 7 Home Premium 17.2s
Windows 7 Enterprise 8.5s
|
Suddenly my old XP machine looks good - but it's not good when there's a lot of number crunching to be carried out!
Ken |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2554 Location: Sydney
|
Posted: Mon Jan 26, 2015 10:16 pm Post subject: |
|
|
Ken,
I have not seen "INQUIRE(IOLENGTH=RECLEN)A(1,1)" used in this way before. Is this legal ?
I tried to improve the performance, by reducing the number of records from 1,000,000 to 1,000. This does have a significant improvement in speed, even though your original code wrote the records out sequentially. I would have expected better buffering.
When I changed this, "INQUIRE(IOLENGTH=RECLEN)A(:,1)" did not work, so a different approach is needed for this. The intrinsic SIZEOF would be a great addition to FTN95!
I would also have expected an improvement with Windows 7xx over XP, but you are not finding this. Perhaps the improvement is more noticeable with larger files.
Anyway, the problem is 1,000,0000 records of 4 bytes is much slower than 1,000 records of 4,000 bytes. Reading is faster, as it is being buffered. Note that clock@ is only accurate to 1/64 second, so the read from buffer can be quicker than this. Hence 0 seconds for read. Use system_clock for better accuracy.
My modified code to see the change is:
Code: | PROGRAM TEST
IMPLICIT NONE
INTEGER RECLEN, NROW, NCOL, I, J, RECORD, K
REAL, ALLOCATABLE:: A(:,:), B(:,:)
REAL T1,T2,T3,T4
LOGICAL :: big_blocks = .true.
do k = 1,2
big_blocks = k > 1
CALL CLOCK@(T1)
NROW = 1000
NCOL = 1000
ALLOCATE(A(NROW,NCOL))
DO J = 1, NCOL, 1
DO I = 1, NROW, 1
A(I,J) = (J-1)*NCOL + I
END DO
END DO
CALL CLOCK@(T2)
WRITE(6,*)'TIME ALLOCATE AND POLULATE ARRAY IN MEMORY', T2-T1
! INQUIRE(IOLENGTH=RECLEN)A(:,1)
reclen = 4
if ( big_blocks) reclen = 4*nrow
write(6,*)'RECL=',reclen
OPEN (UNIT=10,FILE='data1.bin',FORM='UNFORMATTED',ACCESS='DIRECT', &
& RECL=reclen)
DO J = 1, NCOL, 1
if ( big_blocks) then
RECORD = J
WRITE(10,REC=RECORD) A(:,J)
else
DO I = 1, NROW, 1
RECORD = (J-1)*NCOL + I
WRITE(10,REC=RECORD) A(I,J)
END DO
end if
END DO
CALL CLOCK@(T3)
WRITE(6,*)'TIME TO WRITE ARRAY TO DIRECT ACCESS FILE',T3-T2
ALLOCATE(B(NROW,NCOL))
DO J = 1, NCOL, 1
if ( big_blocks) then
RECORD = j
READ(10,REC=RECORD) B(:,J)
else
DO I = 1, NROW, 1
RECORD = (J-1)*NCOL + I
READ(10,REC=RECORD) B(I,J)
END DO
end if
END DO
CALL CLOCK@(T4)
WRITE(6,*)'TIME TO READ ARRAY FROM DIRECT ACCESS FILE',T4-T3
CLOSE(UNIT=10,STATUS='DELETE')
deallocate (a,b)
end do
END |
|
|
Back to top |
|
|
davidb
Joined: 17 Jul 2009 Posts: 560 Location: UK
|
Posted: Mon Jan 26, 2015 10:52 pm Post subject: Re: |
|
|
JohnCampbell wrote: |
I have not seen "INQUIRE(IOLENGTH=RECLEN)A(1,1)" used in this way before. Is this legal ?
|
I think you can use a list of variables which matches the list of variables written or read, so A(1,1), which is a real scalar value is OK. _________________ Programmer in: Fortran 77/95/2003/2008, C, C++ (& OpenMP), java, Python, Perl |
|
Back to top |
|
|
davidb
Joined: 17 Jul 2009 Posts: 560 Location: UK
|
Posted: Mon Jan 26, 2015 10:54 pm Post subject: Re: |
|
|
Kenneth_Smith wrote: |
I realise it will be machine/hardware dependant, but it's intriguing to see how much difference there can be.
|
Perhaps this depends on the hard disc you have fitted. Mine is pretty zippy.
Perhaps, also, writes are slower than reads? _________________ Programmer in: Fortran 77/95/2003/2008, C, C++ (& OpenMP), java, Python, Perl |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2554 Location: Sydney
|
Posted: Mon Jan 26, 2015 11:03 pm Post subject: |
|
|
INQUIRE (IOLENGTH=RECLEN) A(:,1) and
INQUIRE (IOLENGTH=RECLEN) A(1:NROW,1)
both produces a compiler error or stack overflow.
The following is F95 standard conforming for SIZEOF
reclen = size ( transfer ( a(1,1), (/'A'/) ) )
if ( big_blocks) reclen = reclen*size (A(:,1))
write(6,*)'RECL=',reclen
John |
|
Back to top |
|
|
Kenneth_Smith
Joined: 18 May 2012 Posts: 697 Location: Hamilton, Lanarkshire, Scotland.
|
Posted: Mon Jan 26, 2015 11:40 pm Post subject: |
|
|
Thanks John, that is super fast!
As I understand you code B(:,J) is writing a whole column of data as a single record - I must admit that construction does not jump out to me as the obvious way to do things - guess I still think too much in the F77 world!
I did try writing rows of data via an implied do loop - without much success and with hindsight that would still be the same number of writes and my serial approach. I did think that, one possible advantage of the serial approach might be that there is no need to recover the whole array into memory, simply access the required element i,j when required by it's record number, but clearly it may well be easier to recover the whole column that contains i,j.
I see I have much to learn, i would have written
as
Code: | IF (K.GT.1) THEN
BIG_BLOCKS = .TRUE.
ELSE
BIG_BLOCKS = .FALSE.
END IF
|
Thanks again!
Ken |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2554 Location: Sydney
|
Posted: Thu Jun 30, 2016 6:56 am Post subject: |
|
|
Ken,
I was just looking at IOLENGTH and noticed a problem with your example.
The following code is in error :
Code: | DO J = 1, NCOL, 1
DO I = 1, NROW, 1
RECORD = (J-1)*NCOL + I
WRITE(10,REC=RECORD) A(I,J)
END DO
END DO
|
It should be : RECORD = (J-1)*NROW + I
If NROW = NCOL all would be ok !
John |
|
Back to top |
|
|
John-Silver
Joined: 30 Jul 2013 Posts: 1520 Location: Aerospace Valley
|
Posted: Thu Jun 30, 2016 5:57 pm Post subject: |
|
|
reading this post I observe a typical non-anglo-saxon useage of the word INQUIRE.
Maybe all programmers should be forced to read the Oxford ENGLISH Dictionary by heart instead of that woman's version over the pond.
....
Quote: | Enquire’ or ‘inquire’?
The traditional distinction between the verbs enquire and inquire is that enquire is to be used for general senses of ‘ask’, while inquire is reserved for uses meaning ‘make a formal investigation’.
In practice, however, enquire, and the associated noun enquiry, are more common in British English while inquire (and the noun inquiry) are more common in American English, but otherwise there is little discernible distinction in the way the words are used. Some style guides require that only inquire or only enquire be used. |
Clearly it should be called ENQUIRE !!!!! ... no bobbies required to discover the lngth ! LOL
"I was proceding in my investigations concerning said array length and made some enquiries during the course of my inquiry " LOL |
|
Back to top |
|
|
wahorger
Joined: 13 Oct 2014 Posts: 1217 Location: Morrison, CO, USA
|
Posted: Fri Jul 01, 2016 3:25 am Post subject: |
|
|
What I found a long time ago (http://forums.silverfrost.com/viewtopic.php?t=2992) was the record length has an effect, but it's really the total number of I/O calls that determines the performance. Short record, lots of I/O, poor performance as compared to doing the identical kinds of I/O in "C" (but without the flexibility [and overhead] that FTN95 I/O lists gives).
I have abandoned almost all of the direct access file I/O in my software because the performance is so poor. And, for still unknown reasons. Running my benchmarks (written long ago also) on a solid state drive that is blindingly fast still shows the lack of performance discovered a couple of years ago. Running the same benchmark on a RAM disk also showed the same poor performance (albeit less poor than on a hard drive, but the percentages still held true).
For me, that which used to be done in a direct access temporary file is now done in memory. Which limits the size of the data sets that can be processed. Not good, right?
Still modifying code to remove the file limitations of direct access and placing all the files in memory...... |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2554 Location: Sydney
|
Posted: Fri Jul 01, 2016 6:28 am Post subject: |
|
|
Bill,
I read the link you posted. I must admit I don't use Win 8.1 except for 1 program on that PC.
I run most analysis runs on Win 7 desktops as single user, ie no change to SHARE. Apart from your test example from Jan-15, my experience of Direct Access has always been of very good performance and I would always recommend that approach for substantial disk I/O. Did you come up with any fixes to the problem ?
Perhaps the combination of file buffers and some SHARE options is a problem, but reading the post from 18 months ago that may have been dismissed as a major cause. The other problem could have been the virus checker, as it can clash with I/O performance. Perhaps you should split the files, with a small token file with SHARE status, while the large database files do not require share write to work. (Share read should not be a problem, although I don't remember where the previous thread finished up)
My experience is direct access works very well and combines well with the file cache/buffers. All my programs that use it perform well for disk I/O. I would always recommend this approach.
John |
|
Back to top |
|
|
wahorger
Joined: 13 Oct 2014 Posts: 1217 Location: Morrison, CO, USA
|
Posted: Fri Jul 01, 2016 2:15 pm Post subject: |
|
|
John, it works well, but..... The performance hit is too great to ignore. And, it was the same hit whether running the same benchmark under Windows 8 or Windows 2000.
I think, when I get some time, I'll repeat the benchmarking, this time linking in an equivalent "C" implementation and getting the number for that to help in the comparisons. The source of the performance penalty, should we be able to find it, would be of interest to many, I think.
Just FYI, under Windows 3.1 up through XP, I used a different FORTRAN compiler to create the operational code. None of these effects were seen. The same code also harkens back to CP/M days using an early Microsoft compiler. The same FORTRAN code was ported to a VAX and a PDP-11. In all these cases, there was not a performance hit due to the direct-access unformatted I/O.
I may have a platform that can support an old compiler to compile and run the same benchmark. I'll look at adding that to the mix. That would at least eliminate the OS as a culprit. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|