forums.silverfrost.com Forum Index forums.silverfrost.com
Welcome to the Silverfrost forums
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Update: Sporadic error when opening files

 
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> General
View previous topic :: View next topic  
Author Message
wahorger



Joined: 13 Oct 2014
Posts: 1217
Location: Morrison, CO, USA

PostPosted: Fri Jan 23, 2015 6:50 pm    Post subject: Update: Sporadic error when opening files Reply with quote

While I have made progress on a work-around, I have yet to understand the actual cause. It appears to be a combined interaction between the Operating System (Windows 7, 64-bit, Pro, SP1) with a McAfee Virus Scan component added. But, it is not all the Virus Scanner. That is the most unusual part.

The original problem was an unexpected IOSTAT= error code (10005, or 10002) returned from an OPEN. Something was sporadically interfering with the OPEN, and one of the suspects would be McAfee. The files being written were in a folder underneath the Documents folder with a .DAT extension. The cases studied are with McAfee Real Time Scanning (RT) enabled or disabled, and the file being written is either on a MAPPED drive (using SUBST to perform the logical mapping) or on a physical partition. The MAPPED drive is as described, a folder within the Documents folder. The partition drive is the logical F: drive on my system, a partition of the main drive.

If the McAfee Real-Time scanning is enabled, then these errors will occur. The suspicion is that the RT scanning feature gets in the way of the OPEN, possibly intercepting the call prior to the OS getting it. In any case, the occurrence of 10005 errors is about 30% of the time on a MAPPED drive and about 0.5% on an un-Mapped drive. In the cases shown below, there are no CLOSE errors. However, in other testing over a period of a few hours, there is an occasional error in the CLOSE operation, with a 10002 or 10005 error. Occasionally, the error code at CLOSE was seen to be a 32 (from the HELP file: "Second argument to MOD is zero").

Code:
Mapped Drive
           Event Name   IOSTAT  Cur. Time Open  Del  Wrt Clos Count
+    Just checking in        0 090914.852    0    0    0    0    0
+   open (new) failed    10005 090920.862    1    0    0    0    8
+   open (new) failed    10005 090923.738    2    0    0    0    9
+   open (new) failed    10005 090928.402    3    0    0    0   12
+   open (new) failed    10005 090933.127    4    0    0    0   15
+   open (new) failed    10005 090937.801    5    0    0    0   18
+   open (new) failed    10005 090944.177    6    0    0    0   23
+   open (new) failed    10005 090947.035    7    0    0    0   24
+   open (new) failed    10005 090949.870    8    0    0    0   25
+   open (new) failed    10005 090952.762    9    0    0    0   26
+   open (new) failed    10005 090956.436   10    0    0    0   28
+   open (new) failed    10005 091003.523   11    0    0    0   34
+   open (new) failed    10005 091007.143   12    0    0    0   36
+   open (new) failed    10005 091010.751   13    0    0    0   38
+   open (new) failed    10005 091019.889   14    0    0    0   46

14 failures/retries in 46 successful opens


[code:1:0574549259]Unmapped Drive
Event Name IOSTAT Cur. Time Open Del Wrt Clos Count
+ open (new) failed 10005 094152.523 1 0 0 0 15746 Transition from RT Scan Off to ON
+ Just checking in 0 094202.403 1 0 0 0 15872
+ Just checking in 0 094219.090 1 0 0 0 16128
+ Just checking in 0 094235.840 1 0 0 0 16384
+ open (new) failed 10005 094240.040 2 0 0 0 16440
+ open (new) failed 10005 094243.089 3 0 0 0 16458
+ open (new) failed 10005 094254.081 4 0 0 0 16598
+ open (new) failed 10005 094256.198 5 0 0 0 16600
+ open (new) failed 10005 094258.690 6 0 0 0 16608
+ Just checking in 0 094302.822 6 0 0 0 16640
+ open (new) failed 10005 094304.614 7 0 0 0 16665
+ Just checking in 0 094322.231 7 0 0 0 16896
+ Just checking in 0 094338.897 7 0 0 0 17152
+ open (new) failed 10005 094348.565 8
Back to top
View user's profile Send private message Visit poster's website
wahorger



Joined: 13 Oct 2014
Posts: 1217
Location: Morrison, CO, USA

PostPosted: Fri Jan 23, 2015 7:15 pm    Post subject: Reply with quote

In Safe mode, McAfee loads, but was not enabled for RT scanning. In 1280 attempts for both mapped and un-mapped drives, there were no errors detected.

So, whether it is McAfee or not, SOMETHING is getting in the way. But it would appear that a simple retry scheme will work just fine.
Back to top
View user's profile Send private message Visit poster's website
JohnCampbell



Joined: 16 Feb 2006
Posts: 2554
Location: Sydney

PostPosted: Sat Jan 24, 2015 11:22 am    Post subject: Reply with quote

These types of errors can be due to the Microsoft OS.
At times, the windows file handles are not immediately released and you can get a file in use error. As Paul has suggested, using a sleep@ delay and trying again will often fix the problem.

It would be good to get what 10005 or 10002 mean and where they are coming from.

Your tracking and reporting these errors is a good approach.

John
Back to top
View user's profile Send private message
wahorger



Joined: 13 Oct 2014
Posts: 1217
Location: Morrison, CO, USA

PostPosted: Tue Apr 28, 2015 2:34 pm    Post subject: Reply with quote

I have had even more occurrences of odd file open/close behavior that I thought I'd share.

A very old section of the code opens a file reads several records from it, then closes the file. Usually, this is when the user interacts with the program. However, it is a "generic" routine that can be used widely. Its functioning is to prevent a conflict with another user who might need access to this file, so keeping the file opened for a short period of time makes sense. And this is where the problem occurs.

I call the routine multiple times, sequentially, as I step automatically through a set of inputs to the routine. So what I have, in essence, is a rapid sequence of open/read/close/open/read/close/... And, sometimes, I'll get a "File in Use" error (134). I am the only user on the system, and I'm only running this one process that accesses the file.

My suspicion is that once the request to the system to close the file is made, it is possible (perhaps likely?) that a different processor thread is invoked to accomplish this task. While this is proceeding, my process then requests that this file be re-opened. And, this is where the conflict arises, and I get the "File in Use" error.

I went back to the common file open/close routine and verified that I was waiting if an error was detected, and to retry the operation. I did notice that I was using SLEEP1@, and changed that to SLEEP@. I also increased the number of retries.

Now, I can see when the program stutters (almost a pause) while it is going into retry mode. But it is ultimately successful each time. I've run a number of cases in an attempt to have it fail and have not had an issue.

Some had suggested that a delay be put in place to prevent this, and that had been in place, but either not enough delay or not enough retry occurrences. Just like most Windows software, there appears to be a non-deterministic interaction between the applications and the underlying OS. Frustrating, but it can be managed.

I also realize that I should probably have two versions of this function. One to use in a "one-of" situation, and the other should leave the fileopen until it is no longer needed. That is the better technique anyway.

Sigh. One more piece of code to rewrite!!

Hope this helps anyone facing the same situation.
Back to top
View user's profile Send private message Visit poster's website
LitusSaxonicum



Joined: 23 Aug 2005
Posts: 2388
Location: Yateley, Hants, UK

PostPosted: Wed Apr 29, 2015 2:11 pm    Post subject: Reply with quote

It will take far longer, and be much more obstructive to other users (if there are any), to follow your strategy - just get on and read the whole file in one go. Windows will cope with the other users' requests.

If the file is in plain text, it is possible for a user to edit it, and then some checking on file validity is required as you read it just in case they banjaxed it. If the file is only created in your program, and there is never any question of user-editing, then you could write it when you last updated it and re-read it when you want it again as unformatted. In such a case, it will read faster anyway (but who cares about milliseconds?) and doesn't need any checking on the fly - if at all. You can read thousands of lines in the blink of an eye

If you absolutely must go through repeated OPEN - READ - CLOSE cycles, you have to remember that flushing the disk cache is asynchronous with your program, and if your file is being copied out of the disk cache onto the hard drive you will get those file in use errors. With the greatest of respect, expecting a system like Windows to do everything sequentially and in time with your Fortran code is to believe that there have been no advances in operating systems since around 1957! (Or perhaps even earlier).

If you respond to an error by retrying immediately afterwards you will probably catch the same issue - until eventually it works (as you say it does). Sure, you can go to sleep and try again later, but if your experience is that this produces an appreciable delay in your program's responsiveness, you ought to let the user know via a dialog box.

Meanwhile, you have irritated the little imp in the computer who copies the bytes longhand into a ledger ... Hang on, that's the spirit of Terry Pratchett being channelled in my reply!

Eddie
Back to top
View user's profile Send private message
wahorger



Joined: 13 Oct 2014
Posts: 1217
Location: Morrison, CO, USA

PostPosted: Wed Apr 29, 2015 3:03 pm    Post subject: Reply with quote

Yes, Eddie, the program does expect things to be done by the time the system returns control to it, and that is indeed the essence of the problem.

While not from 1957, this code in particular dates from the early 80's, when things were more "predictable".

Even so, why does the "C" version code (run from the command line, and as a 32-bit program), which does this process in the same way, not have a problem? Makes me more than wonder.

And, I still get the sporadic errors with 5 digit long error numbers even when opening a file just once.

Still trying to get a real answer to this class of problem that occurs in my development environment as well as the user machine(s).
Back to top
View user's profile Send private message Visit poster's website
LitusSaxonicum



Joined: 23 Aug 2005
Posts: 2388
Location: Yateley, Hants, UK

PostPosted: Wed Apr 29, 2015 3:43 pm    Post subject: Reply with quote

You might have more luck with an SSD. You'll definitely have less luck with a NAS or external hard drive.

Most computers are now so fast that we forget the timing errors of early windows versions.

If C (+, - , #, ++ or any other variety) works for you then why don't you link the C routine. Does it work with Salford/Silverfrost C - which is a facility that few people using FTN95 even know is there?

Eddie
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> General All times are GMT + 1 Hour
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group