forums.silverfrost.com Forum Index forums.silverfrost.com
Welcome to the Silverfrost forums
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

SLow performance with DIRECT ACCESS unformatted files
Goto page Previous  1, 2, 3, 4  Next
 
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> Support
View previous topic :: View next topic  
Author Message
wahorger



Joined: 13 Oct 2014
Posts: 1217
Location: Morrison, CO, USA

PostPosted: Tue Jan 27, 2015 6:00 pm    Post subject: Reply with quote

Just to be clear, using RAMDISK, while exceptionally fast, is not what is being measured. The measurement has to take into account the real world situation that seems to affect the speed, namely using a hard drive. If I was writing a huge number cruncher with vast arrays stored on disk, I'd certainly use RAMDISK!

As John and I have found (perhaps others?), there is something going on. It's not a reflection on FTN95; some folks don't appear to have this issue. Rather, it is an interaction between good/decent code and the OS (so far, WIN7, XP, Win2K on my side). As I also pointed out, if I use my local network to perform these simple benchmarks, I see no major speed issues (there is a little something there, though).
Back to top
View user's profile Send private message Visit poster's website
mecej4



Joined: 31 Oct 2006
Posts: 1886

PostPosted: Tue Jan 27, 2015 6:40 pm    Post subject: Reply with quote

Quote:
Just to be clear, using RAMDISK, while exceptionally fast, is not what is being measured.
Agreed, and I don't think that I suggested that. Rather, using a ramdisk removes as a factor the disk I/O (buffering, shared access, other processes using other parts of the disk) and my results serve to measure the I/O routines of the compiler runtime and the ramdisk driver.

Those who have seen dramatic slowdowns have not diagnosed the cause, and have not recorded enough of the circumstances to enable a guess to be made. As of now, we could attribute the slowdowns to butterflies in Tasmania flapping their wings.
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 2554
Location: Sydney

PostPosted: Tue Jan 27, 2015 11:37 pm    Post subject: Reply with quote

mecej4,

I agree with your changes 1. and 2. but I disagree with change 3. ( I'd recommend elapsed rather than CPU time)

I also changed the timer I posted to correct real*4 precision problems as:
Code:
 real function elapse_time ()
   integer*8 clock, clock_rate
   integer*8 :: start = -1
   call system_clock ( clock, clock_rate )
   if ( start == -1 ) start = clock
   elapse_time = dble (clock-start) / dble (clock_rate)
 end function elapse_time


For access=DIRECT, what I am finding from the results is: share = COMPAT, DENYNONE or DENYRD ( about 3 Mb/sec Win 7, but .34 mb/sec Win 8.1)
are much slower than: no share, DENYRW or DENYWR ( about 30 mb/sec on Win 7, 25 mb/sec on Win 8.1 )
For access=sequential, this does not occur as multiple write would not make sense(?)
(I could post the pivot tables if required)

It looks like allowing multiple write with Access=DIRECT is the consistent performance problem and noticeably worse on my Win 8.1 "device".

Mecej4, do you show this with your Win 8.1. performance ? Interesting you have a 4200U also.

My pc's are:
Win 8.1 is a i5-4200U dell notebook with 8gb memory and Mcafee virus
Win 7 is a i5-2300 acer desktop with 8gb memory and Microsoft virus.
Looking at my Win 8.1 notebook, another main change between my Win 7 and Win 8.1 is the type of virus checker. Could this be a problem ?

Any ideas ?

John

As an aside, on delays on my Win 8.1 notebook, in Excel when selecting the number format of an accumulated value in a pivot table, there is a significant delay for the pop-up. Another of the annoying delays on the Win 8.1 device I don't like. It would be good to know the cause.
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 1886

PostPosted: Wed Jan 28, 2015 12:34 am    Post subject: Reply with quote

John, the main reason that I turned from SYSTEM_CLOCK to CPU_TIME was that, with your old ELAPSED_TIME function, Intel Fortran gave zero elapsed time for many of the runs, indicating that 23 bits (in REAL*4) were not enough to hold elapsed time. I reran with your new code for ELAPSED_TIME, and the results are hardly different (comparing FTN95 run to FTN95 run and IFort run to IFort run).

The laptop on which I run these tests is an Ultrabook with 4G of RAM, Norton Security Suite, and has an internal 128 GB SSD, with the power plan set to Balanced. The only way to connect a conventional HD is through USB 2/USB 3 or Wifi, and using such a HD would give results that are probably not useful to most people who have other hardware.

If you suspect that your anti-virus software is affecting the results, you could disconnect from the network, disable the anti-virus, run the tests, and reenable the anti-virus before reconnecting to the network.
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 2554
Location: Sydney

PostPosted: Wed Jan 28, 2015 1:06 am    Post subject: Reply with quote

Mecej4,

Quote:
I reran with your new code for ELAPSED_TIME, and the results are hardly different (comparing FTN95 run to FTN95 run and IFort run to IFort run).

When comparing SYSTEM_CLOCK to CPU_TIME, there should be a difference. Based on my observation of Task Manager, CPU_TIME should be much less, although CPU_TIME is not always recorded correctly. I will run again on my Win 8.1 and check the difference.
I notice quite a few delays when using this pc so there are always the old chestnuts of incompatible drivers etc, when there is no clear reason for the delay.

John

ps: I have run for both CPU and elapsed time. For the cases where there is no significant delay CPU ~ Elapsed time. For the cases where there is significant delays, CPU time is much less and about the same time as for the share= tests where there are not significant delays. This could show that the delays are a waiting delay, but waiting on what ?


Last edited by JohnCampbell on Wed Jan 28, 2015 2:02 am; edited 1 time in total
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 1886

PostPosted: Wed Jan 28, 2015 1:53 am    Post subject: Reply with quote

Quote:
there are always the old chestnuts of incompatible drivers etc, when there is no clear reason for the delay
May be so these days, but I remember that older 386/486 PCs came with a plug-in VGA card and, until the card-specific drivers were installed, with the generic VGA driver scrolling on the CMD screen was so slow that the machine was nearly unusable.

Have you checked the list in the Device Manager control panel to see if any devices are lacking proper drivers?
Back to top
View user's profile Send private message
wahorger



Joined: 13 Oct 2014
Posts: 1217
Location: Morrison, CO, USA

PostPosted: Wed Jan 28, 2015 2:24 am    Post subject: Reply with quote

Regarding use of RAM Disk vs. HD:

Here is some data taken using my old test code without the generation of random data (not your new stuff John) run on the HD vs. the RamDisk. Clearly, the elapsed time is greater on the physical drive, but not by much. Except for those DIRECT UNFORMATTED write operations with SHARE=COMPAT , DENYRD and DENYNONE on the HD. In those cases, the HD access time stands out!

Notice that while the elapsed time for RAM Disk is fast, there are significant differences in time even then and under the same conditions as the HD times. This would indicate that additional time is being consumed doing an electronic version of what the HD is doing physically. Just doesn't take as long.
Code:
                        RAM Disk        HD                             
write   5000    rec      0.25928        84.62793        ac=dir  fo=unf  st=rep  sh=denynone
write   5000    rec      0.31299        84.89453        ac=dir  fo=unf  st=old  sh=denynone
read    5000    rec      0.08643         0.59277        ac=dir  fo=unf  st=old  sh=denynone
write   5000    rec      0.02734         0.02930        ac=seq  fo=unf  st=rep  sh=denynone
write   5000    rec      0.05469         0.05420        ac=seq  fo=unf  st=old  sh=denynone
read    5000    rec      0.08789         0.07422        ac=seq  fo=unf  st=old  sh=denynone
write   5000    rec      0.03174         0.03613        ac=dir  fo=unf  st=rep  sh=denyrw
write   5000    rec      0.02051         0.02783        ac=dir  fo=unf  st=old  sh=denyrw
read    5000    rec      0.00391         0.00635        ac=dir  fo=unf  st=old  sh=denyrw
write   5000    rec      0.00586         0.00537        ac=seq  fo=unf  st=rep  sh=denyrw
write   5000    rec      0.00586         0.00635        ac=seq  fo=unf  st=old  sh=denyrw
read    5000    rec      0.00537         0.00537        ac=seq  fo=unf  st=old  sh=denyrw
write   5000    rec      0.03711         0.03662        ac=dir  fo=unf  st=rep  sh=denywr
write   5000    rec      0.02002         0.02100        ac=dir  fo=unf  st=old  sh=denywr
read    5000    rec      0.00635         0.00635        ac=dir  fo=unf  st=old  sh=denywr
write   5000    rec      0.00195         0.00586        ac=seq  fo=unf  st=rep  sh=denywr
write   5000    rec      0.00586         0.00635        ac=seq  fo=unf  st=old  sh=denywr
read    5000    rec      0.00586         0.00586        ac=seq  fo=unf  st=old  sh=denywr
write   5000    rec      0.28809        97.95752        ac=dir  fo=unf  st=rep  sh=denyrd
write   5000    rec      0.27490        95.17139        ac=dir  fo=unf  st=old  sh=denyrd
read    5000    rec      0.08057         0.56641        ac=dir  fo=unf  st=old  sh=denyrd
write   5000    rec      0.02686         0.03857        ac=seq  fo=unf  st=rep  sh=denyrd
write   5000    rec      0.01904         0.01563        ac=seq  fo=unf  st=old  sh=denyrd
read    5000    rec      0.07275         0.07227        ac=seq  fo=unf  st=old  sh=denyrd
write   5000    rec      0.26709        88.37158        ac=dir  fo=unf  st=rep  sh=compat
write   5000    rec      0.26563        87.88721        ac=dir  fo=unf  st=old  sh=compat
read    5000    rec      0.05664         0.58643        ac=dir  fo=unf  st=old  sh=compat
write   5000    rec      0.01074         0.03564        ac=seq  fo=unf  st=rep  sh=compat
write   5000    rec      0.01074         0.04004        ac=seq  fo=unf  st=old  sh=compat
read    5000    rec      0.06104         0.08252        ac=seq  fo=unf  st=old  sh=compat

Something is there, and it is certainly more substantial than a butterfly!
Back to top
View user's profile Send private message Visit poster's website
JohnCampbell



Joined: 16 Feb 2006
Posts: 2554
Location: Sydney

PostPosted: Wed Jan 28, 2015 4:52 am    Post subject: Reply with quote

These are the results taken from a pivot table summary from my latest program which includes elapsed and cpu time reports.
Code:
                DIRECT          SEQUENTIAL
                elapse  cpu     elapse  cpu
Row Labels      Mb/sec  Mb/sec  Mb/sec  Mb/sec
WRITE_1         15.33   18.35   30.38   31.10
sh=             30.09   30.06   30.24   30.10
sh=COMPAT        0.45    6.25   27.95   30.67
sh=DENYNONE      0.45    6.13   30.42   30.88
sh=DENYRD        0.45    5.94   30.58   30.88
sh=DENYRW       31.36   32.13   31.99   32.06
sh=DENYWR       29.19   29.59   31.09   32.03
 
WRITE_2         15.63   18.73   29.96   30.72
sh=             30.99   31.20   30.87   30.92
sh=COMPAT        0.44    6.28   28.16   29.75
sh=DENYNONE      0.46    6.22   29.51   29.89
sh=DENYRD        0.46    6.23   29.26   30.81
sh=DENYRW       31.08   31.24   30.54   30.92
sh=DENYWR       30.36   31.20   31.43   32.03
 
READ            24.44   27.93   30.43   30.98
sh=             30.72   31.27   31.31   32.42
sh=COMPAT       15.37   22.98   29.46   29.96
sh=DENYNONE     20.12   25.37   30.04   30.06
sh=DENYRD       17.94   25.26   30.73   31.02
sh=DENYRW       30.68   31.35   30.32   31.46
sh=DENYWR       31.83   31.35   30.72   30.99
Grand Total     18.47   21.67   30.26   30.94


These results are the average of the 4 do_test results, as Mb per second based on either elapsed or cpu seconds.
eg sh=COMPAT 0.45 6.25 27.95 30.67
this has an average of 0.45 mb per second performance, based on the reported elapsed time and 6.25 mb per second for reported CPU time.
On my Win 7 pc, both these times and rates are similar, as there is no noticeable wait delay.

Bill,
While your results show different performance between RAM and HDD, the delay event times are different, so I am still wondering about the cause of these delay events.

Still need to understand the nature of the delays being observed.

A solution is to avoid COMPAT, DENYNONE or DENYRD, which may not be a problem.

John
Back to top
View user's profile Send private message
wahorger



Joined: 13 Oct 2014
Posts: 1217
Location: Morrison, CO, USA

PostPosted: Wed Jan 28, 2015 5:42 am    Post subject: Reply with quote

John, yes, I am avoiding the "bad boys". Right now, I'm still performing conversion tasks on the software, bringing it in line with FTN95, and testing it thoroughly.

And, having said that, I'm putting this issue on my back burner. While I think this is important to understand, there's a limit to how deep down the rabbit hole I'm willing to go; there's only so much time with which to put out a product.

I am hoping that, sometime, there will be a root cause explanation, either one that offers a solution (like a policy setting), or, perhaps, something in the run-time system that needed a tweak and has now eliminated the problem.

If that doesn't happen, at least there is a workaround, and for that, I am grateful to the group here for answering my posting, and for asking more questions and offering things to try, and for taking to "play" with settings.

Bill
Back to top
View user's profile Send private message Visit poster's website
JohnCampbell



Joined: 16 Feb 2006
Posts: 2554
Location: Sydney

PostPosted: Fri Jul 01, 2016 12:40 pm    Post subject: Reply with quote

Bill,

Following your link to this, I reviewed what I had done and re-ran the tests.
I get the following elapsed times on a 1tb HDD

Code:
 SUMMARY of 19mb write, write, read tests elapsed time

         SHARE            SEQUENTIAL          DIRECT
 SHARE =          test =     0.684200         0.722100   
 SHARE = DENYNONE test =     2.06150         18.2767   
 SHARE = DENYWR   test =     0.681900         0.714300   
 SHARE = DENYRD   test =     2.33870         18.4115   
 SHARE = DENYRW   test =     0.670002         0.705799   
 SHARE = COMPAT   test =     2.57771         17.8660   


These results show that SHARE= DENYNONE, DENYRD or COMPAT all have performance problems with DIRECT access files.
Based on this, COMPAT is not performing as I would expect from the documentation.

If you do not use SHARE= then there is no performance penalty, so the best plan is to be selective on how you share direct access files for allowing multiple write. I presume the consequence is that the file buffers are being constantly flushed. If this is the case the performance penalty is not surprising. Allowing other programs to write to this file while this program is creating and writing is a significant problem for managing the file integrity. Why would you consider the performance penalty a problem. Why would you do it ?

For SEQUENTIAL access, I don't think that multiple write would be an allowed possibility for that type of file, which is why it does not have a performance penalty.

Dismissing DIRECT access performance based on multiple write sharing is not a reasonable test.

John
Back to top
View user's profile Send private message
wahorger



Joined: 13 Oct 2014
Posts: 1217
Location: Morrison, CO, USA

PostPosted: Fri Jul 01, 2016 2:25 pm    Post subject: Reply with quote

Thanks, John, for the re-run and modification of the sharing.

The SHARE is important to my code, as multiple users can be accessing the same file; they are just not allowed to access it at the same time. So the SHARE being used is a logical consequence of this. Referring back to my previous message on this thread, the older system(s) did not have this issue with a performance penalty.

So, the question is whether or not the SHARE option is the culprit, or is it the OS? Or, is it a combination of both?

Thanks for running this benchmark comparison and continuing the thread toward, hopefully, a resolution.
Back to top
View user's profile Send private message Visit poster's website
wahorger



Joined: 13 Oct 2014
Posts: 1217
Location: Morrison, CO, USA

PostPosted: Fri Jul 01, 2016 9:57 pm    Post subject: Reply with quote

Perhaps someone can say if the SHARE option uses _fsopen() to accomplish this function, while if SHARE is not present, perhaps fopen()?
Back to top
View user's profile Send private message Visit poster's website
JohnCampbell



Joined: 16 Feb 2006
Posts: 2554
Location: Sydney

PostPosted: Sat Jul 02, 2016 8:33 am    Post subject: Reply with quote

Bill,

I am interested to know why you need multiple write access to a direct access file. I suspect any multi-user database system would require this. This is not a typical use of direct access files, which does come with a performance penalty.

As to why sequential access is faster, isn't this because you can't have multiple write access for a sequential file, ie multiple write is not available for that type of file, or only a single user could append to the end of the file.

I think the reason why multiple write is slower on more recent versions of windows is because the file buffers are larger, so continually flushing the buffers after every write involves more disk I/O. Allowing multiple write would be equivalent to closing the file after every write operation.

John
Back to top
View user's profile Send private message
wahorger



Joined: 13 Oct 2014
Posts: 1217
Location: Morrison, CO, USA

PostPosted: Sat Jul 02, 2016 2:37 pm    Post subject: Reply with quote

John,

Actually, I never allow multiple WRITE access, but I do allow multiple READ access (SHARE='DENYWR'). When I need exclusive access, I deny read and write (SHARE='DENYRW'). I had to play around with these a lot during the transition to FTN95.

The first attempt to limit other users from accessing the file was SHARE='COMPAT'. What in the old system was not more than the blink of an eye became a significant pause in the operation of the code, thus prompting the investigation 18 months ago.

I find it interesting that SHARE='DENYNONE' has the performance penalty. It would logically seem to be the same as SHARE=' '. Perhaps not...

In any event [and for me specifically], taking 99+% of the temporary/working files out and doing the job in memory means I don't have to worry this any more, but that's just me and it isn't a significant limitation to the program anyway.

Whether this is a "problem" or "just how it is", it is important for current and future users to be aware.
Back to top
View user's profile Send private message Visit poster's website
JohnCampbell



Joined: 16 Feb 2006
Posts: 2554
Location: Sydney

PostPosted: Tue Jul 05, 2016 4:41 am    Post subject: Reply with quote

Bill,

Based on my testing example above, my interpretation of COMPAT is it should be the same as DENYRW, which is the same as SHARE=' '.
The only cases that allow multiple write are DENYNONE and DENYRD.

Either COMPAT is wrong, as it appears to be allowing multiple write or my interpretation is wrong ? If I understand this correctly, then COMPAT needs to be fixed.

Do you or anyone else know what is the correct interpretation of this problem ?

John
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> Support All times are GMT + 1 Hour
Goto page Previous  1, 2, 3, 4  Next
Page 3 of 4

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group