Silverfrost Forums

Welcome to our forums

BACKSPACE on wide files

10 Feb 2014 11:09 #13688

I am trying to read an external formatted input file. The first column of data in the file contains a representation of the date using a combination of numbers '-' and '/' and then follows a set of columns with data in unknown, and possibly variable, format. For example, a line in the file may look something like:

2010-10/12 12.3 45678.90

It is possible that the first field (the date) can change in width in the file.

Unfortunately, because of the '/' in the date, the line does not seem to be easily read in using FMT='*' in a READ statement. Therefore, the approach I have been taking is:

  1. read each line in using FMT='(A)'
  2. work out how many characters (n) there are in the date
  3. BACKSPACE
  4. read only the date using FMT='(An)', and ADVANCE='no'
  5. read the data in the rest of the line using another READ statement with FMT='*', and ADVANCE='yes'

Assume that I have a subroutine width_date that takes a character input determines how wide the date is, and outputs a format statement, then the code is as follows:

READ (UNIT=iin,FMT='(A)',ERR=1,END=2) c
CALL width_date (c,cfmt)
BACKSPACE (UNIT=iin)
READ (UNIT=iin,FMT=cfmt,ADVANCE='no',ERR=1,END=2) cdate
READ (UNIT=iin,FMT=*,ADVANCE='yes',ERR=1) 

The above works fine except that when I have a very wide input file, the BACKSPACE does not seem to take me back to the beginning of the line. I'm not quite sure how wide the file needs to be in order for the procedure to stop working, but I have a file that is >23000 columns wide that fails to backspace properly.

Of course, if I made LEN(c) in the first line large enough I would not need to backspace in the file and could change the units in the last two read statements as UNIT=c. But if I don't know how wide the file is, there is a possibility that I may set LEN(c) too small; hence the rather complicated procedure above.

So, my question is: is there any reason why BACKSPACE may not be working as expected if the input file is exceedingly wide?

I can provide an example short program and problem input file to illustrate the problem if anyone needs, but you would have to let me know how best to make these available.

11 Feb 2014 2:11 #13690

Simon,

If in step 1, you have read in the record into a character string C, is there a reason that you can't skip step 3 and apply steps 4 and 5 to the character string C.

From your sample code, it apprears that steps 4 and 5 are operating on a single line, which is fully contained in string C.

You could even replace step 2 with 'Parse string C and return 2 strings, being the date string and the rest of the data.

I must admit that I go back to when you could only backspace on a binary file. Even then variable length binary files were trouble to backspace, which lead me to write a library for variable length binary record file that was based on a fixed length direct access file. I'm always cautious of using approaches that were considered inefficient 30 years ago.

John

11 Feb 2014 12:44 #13697

Hi John,

I cannot go back to reading the data from c because I don't know whether c has been long enough to contain all the data.

11 Feb 2014 1:04 #13699

There have been a few examples of stream input lately. Could you open the file as: OPEN (unit=22, file='file_name', access='transparent', form='unformatted', iostat=iostat)

With this you could write a few access routines. subroutine get_next_character subroutine get_next_n_characters subroutine get_rest_of_line subroutine get_next_line subroutine get_start_date

Alternatively, how big could a single record be ? 2k, 20k .. I don't like data files with excessively long records. ( I don't think formatted I/O does either) Redefine the record structure to allow for & continuation.

Some hopefully helpful ideas ?

John

12 Feb 2014 11:52 #13701

John, You are right about formatted I/O not being happy with long lines, but in the past, when lines get quite long I have used a RECL= with a large number which seems as though it might allocate a longer buffer than the normal. Now, I know that is really only for direct access, but it seemed to work. I'n not sure which compiler, it might be VAX-11 Fortran, but I think it worked on the FTN77/FTN95 series of compilers as well. Of course all that was a long time ago when I was younger! Ian

12 Feb 2014 5:48 #13703

RECL may be used legitimately with Seqeuntial files in Fortran 95 (Section 9.3.4.5 of draft standard).

With Sequential files, it specifies the [u:21be174efc]maximum [/u:21be174efc]record size in [u:21be174efc]characters[/u:21be174efc].

The default maximum record length is processor/compiler dependent.

What is the default maximum record length with FTN95? (Paul?)

This could be being exceeded causing simon's problem with wide files.

24 Feb 2014 8:56 #13752

The simple program below can be used to generate the BACKSPACE problem. It seems that the program fails as soon as the length of the line exceeds 213. If the file width exceeds 2*13, backspace will only move back 8192 (213) spaces in the file.

If I open the file (line 21) with the RECL specified, the BACKSPACE seems unaffected - the program still moves back only 8192 spaces in the file.

! This program identifies maximum width of file for which BACKSPACE works.
PROGRAM p
  IMPLICIT NONE
  INTEGER, PARAMETER :: iout=21
  INTEGER, PARAMETER :: iin=11
  INTEGER :: i,n
  INTEGER, DIMENSION(:), ALLOCATABLE :: r
  CHARACTER(LEN=8) :: c0,c1,c2
  n=1
  DO
    ALLOCATE (r(n))
    DO i=1,n
       r(i)=NINT(RANDOM@()*1.0d4)-1
    END DO
    OPEN (UNIT=iout,FILE='test.txt',ACTION='write',FORM='formatted',STATUS='unknown')
    WRITE (UNIT=iout,FMT='(A)') 'A'
    WRITE (UNIT=iout,FMT='(A,32768I5)') 'B',(r(i),i=1,n)
    WRITE (UNIT=iout,FMT='(A)') 'C'
    CLOSE (UNIT=iout)
!
    OPEN (UNIT=iin,FILE='test.txt',ACTION='read',FORM='formatted',STATUS='old')
    READ (UNIT=iin,FMT=*) c0
    READ (UNIT=iin,FMT=*) c1,(r(i),i=1,n)
    BACKSPACE (UNIT=iin)
    READ (UNIT=iin,FMT=*) c2
    CLOSE (UNIT=iin)
    DEALLOCATE (r)
    IF (c2/=c1) EXIT
    n=n+1
  END DO
  PRINT *, 'C1 ',c1
  PRINT *, 'C2 ',c2
  PRINT *, n,1+5*n
END PROGRAM p
24 Feb 2014 11:18 #13757

Simon,

I have probably said this before, but why would you require such long text records ?

I'd change: WRITE (UNIT=iout,FMT='(A,32768I5)') 'B',(r(i),i=1,n) to: WRITE (UNIT=iout,FMT='(A,2I5)') 'B',n WRITE (UNIT=iout,FMT='(A,2I5)') ('B',i,r(i),i=1,n) or: WRITE (UNIT=iout,FMT='(A,2I5)') 'B',n WRITE (UNIT=iout,FMT='(1x,10I5)') (r(i),i=1,n)

When you go to read this, there would be no need for backspace and the overhead for file size or I/O time is not significant. Provide a file data structure that is easy to manage. I work with survey points files with up to 100 million points and a simple text format that is easy to review saves a lot of time.

There are alternatives to notepad for large files. I even have my own line editor that displays the first 1gb of a text file. You might claim to have no control of the input file format, but if I received a file as you show, the first thing I would do (have actually done) is write a conversion to a more manageable format and archive the originals. Puting in a few extra <CR><LF> won't cost you much.

John

25 Feb 2014 1:10 #13758

Thanks John,

The basic principle I am applying here is that the software I am trying to create should be able to read somebody else's file without me imposing limits on the user. I don't want to have to say to the user 'sorry, if your file is wider than 8192 characters then you will have to reformat it somehow.' The point is not so much that I want to be able to read or create wide files, the point is IF one has a wide file, how do you read it?

I agree that here is a strong case for using a more sensible file format. But it would be helpful to at least get an error message from BACKSPACE if it has not worked. Anyway, perhaps Paul could include in the maunal somewhere a comment that there is a limit of 8192.

One bonus is that FTN95 can at least read wider files than NAGWare, for example. NAGWare baulks beyond 1024, but it complains at the stage of trying to read the line whereas FTN95 keeps going and you don't immediately realise that there has been an error.

25 Feb 2014 3:03 #13759

Simon,

You could look at FORM='UNFORMATTED',ACCESS='TRANSPARENT' I have found this to provide a good solution, as you can manage what ever character buffer you require. It is easy to write subroutines to read or write characters and built up strings and records. I am not sure if you can use an internal formatted read on a very long character, such as character buffer*20000

I read the next record into a large character array (1 character at a time) and do my own parsing for numbers etc. It works very well. The line editor I mentioned has the following declarations which provide for large records and files:

      INTEGER*4, PARAMETER :: milion =   1000000  ! 1 million
      INTEGER*4, PARAMETER :: MAXLIN = 20*milion  ! max lines in file       20m
      INTEGER*4, PARAMETER :: MAXSTR =950*milion  ! max characters in file 950mb
!
      COMMON /FILCOM/ CSTOR(MAXSTR)
      COMMON /FILIND/ START(MAXLIN), LENGTH(MAXLIN), LINE_ORDER(MAXLIN)
!
      CHARACTER*1   CSTOR
      INTEGER       START, LENGTH, LINE_ORDER
25 Feb 2014 7:44 #13760

At a quick glance the limit looks like 32K.

Please login to reply.