|
forums.silverfrost.com Welcome to the Silverfrost forums
|
View previous topic :: View next topic |
Author |
Message |
wahorger
Joined: 13 Oct 2014 Posts: 1217 Location: Morrison, CO, USA
|
Posted: Tue Jan 27, 2015 6:00 pm Post subject: |
|
|
Just to be clear, using RAMDISK, while exceptionally fast, is not what is being measured. The measurement has to take into account the real world situation that seems to affect the speed, namely using a hard drive. If I was writing a huge number cruncher with vast arrays stored on disk, I'd certainly use RAMDISK!
As John and I have found (perhaps others?), there is something going on. It's not a reflection on FTN95; some folks don't appear to have this issue. Rather, it is an interaction between good/decent code and the OS (so far, WIN7, XP, Win2K on my side). As I also pointed out, if I use my local network to perform these simple benchmarks, I see no major speed issues (there is a little something there, though). |
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1886
|
Posted: Tue Jan 27, 2015 6:40 pm Post subject: |
|
|
Quote: | Just to be clear, using RAMDISK, while exceptionally fast, is not what is being measured. | Agreed, and I don't think that I suggested that. Rather, using a ramdisk removes as a factor the disk I/O (buffering, shared access, other processes using other parts of the disk) and my results serve to measure the I/O routines of the compiler runtime and the ramdisk driver.
Those who have seen dramatic slowdowns have not diagnosed the cause, and have not recorded enough of the circumstances to enable a guess to be made. As of now, we could attribute the slowdowns to butterflies in Tasmania flapping their wings. |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2554 Location: Sydney
|
Posted: Tue Jan 27, 2015 11:37 pm Post subject: |
|
|
mecej4,
I agree with your changes 1. and 2. but I disagree with change 3. ( I'd recommend elapsed rather than CPU time)
I also changed the timer I posted to correct real*4 precision problems as: Code: | real function elapse_time ()
integer*8 clock, clock_rate
integer*8 :: start = -1
call system_clock ( clock, clock_rate )
if ( start == -1 ) start = clock
elapse_time = dble (clock-start) / dble (clock_rate)
end function elapse_time
|
For access=DIRECT, what I am finding from the results is: share = COMPAT, DENYNONE or DENYRD ( about 3 Mb/sec Win 7, but .34 mb/sec Win 8.1)
are much slower than: no share, DENYRW or DENYWR ( about 30 mb/sec on Win 7, 25 mb/sec on Win 8.1 )
For access=sequential, this does not occur as multiple write would not make sense(?)
(I could post the pivot tables if required)
It looks like allowing multiple write with Access=DIRECT is the consistent performance problem and noticeably worse on my Win 8.1 "device".
Mecej4, do you show this with your Win 8.1. performance ? Interesting you have a 4200U also.
My pc's are:
Win 8.1 is a i5-4200U dell notebook with 8gb memory and Mcafee virus
Win 7 is a i5-2300 acer desktop with 8gb memory and Microsoft virus.
Looking at my Win 8.1 notebook, another main change between my Win 7 and Win 8.1 is the type of virus checker. Could this be a problem ?
Any ideas ?
John
As an aside, on delays on my Win 8.1 notebook, in Excel when selecting the number format of an accumulated value in a pivot table, there is a significant delay for the pop-up. Another of the annoying delays on the Win 8.1 device I don't like. It would be good to know the cause. |
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1886
|
Posted: Wed Jan 28, 2015 12:34 am Post subject: |
|
|
John, the main reason that I turned from SYSTEM_CLOCK to CPU_TIME was that, with your old ELAPSED_TIME function, Intel Fortran gave zero elapsed time for many of the runs, indicating that 23 bits (in REAL*4) were not enough to hold elapsed time. I reran with your new code for ELAPSED_TIME, and the results are hardly different (comparing FTN95 run to FTN95 run and IFort run to IFort run).
The laptop on which I run these tests is an Ultrabook with 4G of RAM, Norton Security Suite, and has an internal 128 GB SSD, with the power plan set to Balanced. The only way to connect a conventional HD is through USB 2/USB 3 or Wifi, and using such a HD would give results that are probably not useful to most people who have other hardware.
If you suspect that your anti-virus software is affecting the results, you could disconnect from the network, disable the anti-virus, run the tests, and reenable the anti-virus before reconnecting to the network. |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2554 Location: Sydney
|
Posted: Wed Jan 28, 2015 1:06 am Post subject: |
|
|
Mecej4,
Quote: | I reran with your new code for ELAPSED_TIME, and the results are hardly different (comparing FTN95 run to FTN95 run and IFort run to IFort run). |
When comparing SYSTEM_CLOCK to CPU_TIME, there should be a difference. Based on my observation of Task Manager, CPU_TIME should be much less, although CPU_TIME is not always recorded correctly. I will run again on my Win 8.1 and check the difference.
I notice quite a few delays when using this pc so there are always the old chestnuts of incompatible drivers etc, when there is no clear reason for the delay.
John
ps: I have run for both CPU and elapsed time. For the cases where there is no significant delay CPU ~ Elapsed time. For the cases where there is significant delays, CPU time is much less and about the same time as for the share= tests where there are not significant delays. This could show that the delays are a waiting delay, but waiting on what ?
Last edited by JohnCampbell on Wed Jan 28, 2015 2:02 am; edited 1 time in total |
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1886
|
Posted: Wed Jan 28, 2015 1:53 am Post subject: |
|
|
Quote: | there are always the old chestnuts of incompatible drivers etc, when there is no clear reason for the delay | May be so these days, but I remember that older 386/486 PCs came with a plug-in VGA card and, until the card-specific drivers were installed, with the generic VGA driver scrolling on the CMD screen was so slow that the machine was nearly unusable.
Have you checked the list in the Device Manager control panel to see if any devices are lacking proper drivers? |
|
Back to top |
|
|
wahorger
Joined: 13 Oct 2014 Posts: 1217 Location: Morrison, CO, USA
|
Posted: Wed Jan 28, 2015 2:24 am Post subject: |
|
|
Regarding use of RAM Disk vs. HD:
Here is some data taken using my old test code without the generation of random data (not your new stuff John) run on the HD vs. the RamDisk. Clearly, the elapsed time is greater on the physical drive, but not by much. Except for those DIRECT UNFORMATTED write operations with SHARE=COMPAT , DENYRD and DENYNONE on the HD. In those cases, the HD access time stands out!
Notice that while the elapsed time for RAM Disk is fast, there are significant differences in time even then and under the same conditions as the HD times. This would indicate that additional time is being consumed doing an electronic version of what the HD is doing physically. Just doesn't take as long.
Code: | RAM Disk HD
write 5000 rec 0.25928 84.62793 ac=dir fo=unf st=rep sh=denynone
write 5000 rec 0.31299 84.89453 ac=dir fo=unf st=old sh=denynone
read 5000 rec 0.08643 0.59277 ac=dir fo=unf st=old sh=denynone
write 5000 rec 0.02734 0.02930 ac=seq fo=unf st=rep sh=denynone
write 5000 rec 0.05469 0.05420 ac=seq fo=unf st=old sh=denynone
read 5000 rec 0.08789 0.07422 ac=seq fo=unf st=old sh=denynone
write 5000 rec 0.03174 0.03613 ac=dir fo=unf st=rep sh=denyrw
write 5000 rec 0.02051 0.02783 ac=dir fo=unf st=old sh=denyrw
read 5000 rec 0.00391 0.00635 ac=dir fo=unf st=old sh=denyrw
write 5000 rec 0.00586 0.00537 ac=seq fo=unf st=rep sh=denyrw
write 5000 rec 0.00586 0.00635 ac=seq fo=unf st=old sh=denyrw
read 5000 rec 0.00537 0.00537 ac=seq fo=unf st=old sh=denyrw
write 5000 rec 0.03711 0.03662 ac=dir fo=unf st=rep sh=denywr
write 5000 rec 0.02002 0.02100 ac=dir fo=unf st=old sh=denywr
read 5000 rec 0.00635 0.00635 ac=dir fo=unf st=old sh=denywr
write 5000 rec 0.00195 0.00586 ac=seq fo=unf st=rep sh=denywr
write 5000 rec 0.00586 0.00635 ac=seq fo=unf st=old sh=denywr
read 5000 rec 0.00586 0.00586 ac=seq fo=unf st=old sh=denywr
write 5000 rec 0.28809 97.95752 ac=dir fo=unf st=rep sh=denyrd
write 5000 rec 0.27490 95.17139 ac=dir fo=unf st=old sh=denyrd
read 5000 rec 0.08057 0.56641 ac=dir fo=unf st=old sh=denyrd
write 5000 rec 0.02686 0.03857 ac=seq fo=unf st=rep sh=denyrd
write 5000 rec 0.01904 0.01563 ac=seq fo=unf st=old sh=denyrd
read 5000 rec 0.07275 0.07227 ac=seq fo=unf st=old sh=denyrd
write 5000 rec 0.26709 88.37158 ac=dir fo=unf st=rep sh=compat
write 5000 rec 0.26563 87.88721 ac=dir fo=unf st=old sh=compat
read 5000 rec 0.05664 0.58643 ac=dir fo=unf st=old sh=compat
write 5000 rec 0.01074 0.03564 ac=seq fo=unf st=rep sh=compat
write 5000 rec 0.01074 0.04004 ac=seq fo=unf st=old sh=compat
read 5000 rec 0.06104 0.08252 ac=seq fo=unf st=old sh=compat
|
Something is there, and it is certainly more substantial than a butterfly! |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2554 Location: Sydney
|
Posted: Wed Jan 28, 2015 4:52 am Post subject: |
|
|
These are the results taken from a pivot table summary from my latest program which includes elapsed and cpu time reports. Code: | DIRECT SEQUENTIAL
elapse cpu elapse cpu
Row Labels Mb/sec Mb/sec Mb/sec Mb/sec
WRITE_1 15.33 18.35 30.38 31.10
sh= 30.09 30.06 30.24 30.10
sh=COMPAT 0.45 6.25 27.95 30.67
sh=DENYNONE 0.45 6.13 30.42 30.88
sh=DENYRD 0.45 5.94 30.58 30.88
sh=DENYRW 31.36 32.13 31.99 32.06
sh=DENYWR 29.19 29.59 31.09 32.03
WRITE_2 15.63 18.73 29.96 30.72
sh= 30.99 31.20 30.87 30.92
sh=COMPAT 0.44 6.28 28.16 29.75
sh=DENYNONE 0.46 6.22 29.51 29.89
sh=DENYRD 0.46 6.23 29.26 30.81
sh=DENYRW 31.08 31.24 30.54 30.92
sh=DENYWR 30.36 31.20 31.43 32.03
READ 24.44 27.93 30.43 30.98
sh= 30.72 31.27 31.31 32.42
sh=COMPAT 15.37 22.98 29.46 29.96
sh=DENYNONE 20.12 25.37 30.04 30.06
sh=DENYRD 17.94 25.26 30.73 31.02
sh=DENYRW 30.68 31.35 30.32 31.46
sh=DENYWR 31.83 31.35 30.72 30.99
Grand Total 18.47 21.67 30.26 30.94
|
These results are the average of the 4 do_test results, as Mb per second based on either elapsed or cpu seconds.
eg sh=COMPAT 0.45 6.25 27.95 30.67
this has an average of 0.45 mb per second performance, based on the reported elapsed time and 6.25 mb per second for reported CPU time.
On my Win 7 pc, both these times and rates are similar, as there is no noticeable wait delay.
Bill,
While your results show different performance between RAM and HDD, the delay event times are different, so I am still wondering about the cause of these delay events.
Still need to understand the nature of the delays being observed.
A solution is to avoid COMPAT, DENYNONE or DENYRD, which may not be a problem.
John |
|
Back to top |
|
|
wahorger
Joined: 13 Oct 2014 Posts: 1217 Location: Morrison, CO, USA
|
Posted: Wed Jan 28, 2015 5:42 am Post subject: |
|
|
John, yes, I am avoiding the "bad boys". Right now, I'm still performing conversion tasks on the software, bringing it in line with FTN95, and testing it thoroughly.
And, having said that, I'm putting this issue on my back burner. While I think this is important to understand, there's a limit to how deep down the rabbit hole I'm willing to go; there's only so much time with which to put out a product.
I am hoping that, sometime, there will be a root cause explanation, either one that offers a solution (like a policy setting), or, perhaps, something in the run-time system that needed a tweak and has now eliminated the problem.
If that doesn't happen, at least there is a workaround, and for that, I am grateful to the group here for answering my posting, and for asking more questions and offering things to try, and for taking to "play" with settings.
Bill |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2554 Location: Sydney
|
Posted: Fri Jul 01, 2016 12:40 pm Post subject: |
|
|
Bill,
Following your link to this, I reviewed what I had done and re-ran the tests.
I get the following elapsed times on a 1tb HDD
Code: | SUMMARY of 19mb write, write, read tests elapsed time
SHARE SEQUENTIAL DIRECT
SHARE = test = 0.684200 0.722100
SHARE = DENYNONE test = 2.06150 18.2767
SHARE = DENYWR test = 0.681900 0.714300
SHARE = DENYRD test = 2.33870 18.4115
SHARE = DENYRW test = 0.670002 0.705799
SHARE = COMPAT test = 2.57771 17.8660
|
These results show that SHARE= DENYNONE, DENYRD or COMPAT all have performance problems with DIRECT access files.
Based on this, COMPAT is not performing as I would expect from the documentation.
If you do not use SHARE= then there is no performance penalty, so the best plan is to be selective on how you share direct access files for allowing multiple write. I presume the consequence is that the file buffers are being constantly flushed. If this is the case the performance penalty is not surprising. Allowing other programs to write to this file while this program is creating and writing is a significant problem for managing the file integrity. Why would you consider the performance penalty a problem. Why would you do it ?
For SEQUENTIAL access, I don't think that multiple write would be an allowed possibility for that type of file, which is why it does not have a performance penalty.
Dismissing DIRECT access performance based on multiple write sharing is not a reasonable test.
John |
|
Back to top |
|
|
wahorger
Joined: 13 Oct 2014 Posts: 1217 Location: Morrison, CO, USA
|
Posted: Fri Jul 01, 2016 2:25 pm Post subject: |
|
|
Thanks, John, for the re-run and modification of the sharing.
The SHARE is important to my code, as multiple users can be accessing the same file; they are just not allowed to access it at the same time. So the SHARE being used is a logical consequence of this. Referring back to my previous message on this thread, the older system(s) did not have this issue with a performance penalty.
So, the question is whether or not the SHARE option is the culprit, or is it the OS? Or, is it a combination of both?
Thanks for running this benchmark comparison and continuing the thread toward, hopefully, a resolution. |
|
Back to top |
|
|
wahorger
Joined: 13 Oct 2014 Posts: 1217 Location: Morrison, CO, USA
|
Posted: Fri Jul 01, 2016 9:57 pm Post subject: |
|
|
Perhaps someone can say if the SHARE option uses _fsopen() to accomplish this function, while if SHARE is not present, perhaps fopen()? |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2554 Location: Sydney
|
Posted: Sat Jul 02, 2016 8:33 am Post subject: |
|
|
Bill,
I am interested to know why you need multiple write access to a direct access file. I suspect any multi-user database system would require this. This is not a typical use of direct access files, which does come with a performance penalty.
As to why sequential access is faster, isn't this because you can't have multiple write access for a sequential file, ie multiple write is not available for that type of file, or only a single user could append to the end of the file.
I think the reason why multiple write is slower on more recent versions of windows is because the file buffers are larger, so continually flushing the buffers after every write involves more disk I/O. Allowing multiple write would be equivalent to closing the file after every write operation.
John |
|
Back to top |
|
|
wahorger
Joined: 13 Oct 2014 Posts: 1217 Location: Morrison, CO, USA
|
Posted: Sat Jul 02, 2016 2:37 pm Post subject: |
|
|
John,
Actually, I never allow multiple WRITE access, but I do allow multiple READ access (SHARE='DENYWR'). When I need exclusive access, I deny read and write (SHARE='DENYRW'). I had to play around with these a lot during the transition to FTN95.
The first attempt to limit other users from accessing the file was SHARE='COMPAT'. What in the old system was not more than the blink of an eye became a significant pause in the operation of the code, thus prompting the investigation 18 months ago.
I find it interesting that SHARE='DENYNONE' has the performance penalty. It would logically seem to be the same as SHARE=' '. Perhaps not...
In any event [and for me specifically], taking 99+% of the temporary/working files out and doing the job in memory means I don't have to worry this any more, but that's just me and it isn't a significant limitation to the program anyway.
Whether this is a "problem" or "just how it is", it is important for current and future users to be aware. |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2554 Location: Sydney
|
Posted: Tue Jul 05, 2016 4:41 am Post subject: |
|
|
Bill,
Based on my testing example above, my interpretation of COMPAT is it should be the same as DENYRW, which is the same as SHARE=' '.
The only cases that allow multiple write are DENYNONE and DENYRD.
Either COMPAT is wrong, as it appears to be allowing multiple write or my interpretation is wrong ? If I understand this correctly, then COMPAT needs to be fixed.
Do you or anyone else know what is the correct interpretation of this problem ?
John |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|