Silverfrost Forums

Welcome to our forums

Reading MAC format ASCII files?

25 Feb 2010 8:32 #6037

A normal PC format ASCII file uses CRLF (ASCII 13,10) as a line separator. Unix uses LF (ASCII 10). Both of these formats read quite happily using standard IO calls (if opened with CARRIAGECONTROL=LIST).

However, occasionally, we recieve files that are separated by CR (ASCII 13) alone. These don't read in. The CRs are stripped out and the entire file gets read in by a single READ(LUN,'(A)') call, which then causes a crash because of a buffer overwrite (I think!).

Browsing the web seems to indicate that the files may have originally come from a Mac? I can convert them using Wordpad (which replaces CR with CRLF) but I'd rather my program didn't crash on reading the original file!

So, to cut to the chase, is there a setting on opening a FORMATTED ASCII file that will accept CR, CRLF or LF as a line separator?

TIA

K

25 Feb 2010 12:50 #6040

You could use OPEN (... ACCESS='TRANSPARENT'...) then write your own routine for GET_NEXT_LINE that reads one character at a time to cope with all mixes of <CR> and <LF>. I have had to do this in the past to also cope with any other non-printable character that can be found in files from other devices.

25 Feb 2010 2:34 #6041

Thanks, John,

I was kinda hoping for something that wouldn't need me to rewrite large chunks of code!

Perhaps I'll use your technique to at least flag that there is a problem and recommend the 'Wordpad' solution.

But if anyone's got a neater solution, don't be shy!

Tks

K

25 Feb 2010 4:00 #6043

Kenny,

Not necessary to rewrite large chunks of code.

You just need to write one routine only, which first checks whether you have a unix style text file or not and then converts it into a DOS type format (using John's method) if found to be a unix format.

Subsequently all your existing code will read it okay. I have also had to do this myself.

regards, John

25 Feb 2010 4:14 #6044

OK, is there a guide to what to do? I can't see anything in the manual.

Tks

K

25 Feb 2010 11:18 #6047

Kenny,

John Campbell has given the basics of how to do this.

As an alternative to OPEN with ACCESS='TRANSPARENT' you could use OPENRW@ with READF@ to read one byte at a time.

Assuming you have scanned a file to find that it is in unix format, then perhaps something like this:-

  1. rename the file using CISSUE or START_PROCESS@

  2. open the file (after renaming) with OPENRW@

  3. open a new file using the original file name with OPEN

  4. read the original file one byte (one text character) at a time, store all characters that are not CR 13 in a string that you increment in length with each character.

  5. when you hit a CR13 write the string contents as a single string of characters to the new file, discard the CR13, set the length of the string back to zero and repeat step 4)

  6. when you get to the end of the file just write the string contents to the new file.

  7. close the original file and delete it.

  8. close the new file which has the same original name as the old file and is now in DOS ascii text format.

that's it !

cheers, John

25 Feb 2010 11:22 (Edited: 26 Feb 2010 7:05) #6048

The following 'RE-'tested code should go part of the way. You may have to improve it to cope with:

  • other non-printable characters, such as TAB, should not be ignored (see ASCII in wikipedia)

  • multiple CR CR or LF LF may imply valid multiple blank lines The concept should be able to be improved. As John H suggested, you could use this at the start of the program to clean out the file before using generally, or simply replace the read statement(s) by the subroutine call.

        subroutine get_next_line (unit, buffer, nc)
    

    ! ! routine to read the file and get the next valid line ! Possible Line termination formats are ! CRLF ! CR ! LF ! integer4 unit ! file unit number character buffer() ! character string integer4 nc ! number of characters found; 0 = blank line, -1 = end of file ! integer4 num ! number of characters recovered integer4 last_char ! last character recovered integer4 end_char ! character to end last line character this_line132 ! line to store characters character c1 integer4 ic, i integer4 iostat ! data last_char / -1 / ! -1 indicates file has not been opened data end_char / -1 / ! last character to terminate a line ! ! Find the next line num = 0 this_line = ' ' last_char = 0 do call get_next_character (unit, c, iostat) ! ! Termination of file if (iostat /= 0) then ! end of file if (num > 0) then buffer = this_line nc = num return else nc = -abs (iostat) return end if else ! ic = mod (ichar (c),128) ! ! save normal character to line buffer if (ic >= 32) then ! normal character if (num < len(this_line) ) then num = num + 1 this_line(num:num) = char (ic) else write (,) 'Buffer_overflow' end if last_char = ic ! ! detect LF or CR at end of line else if (ic == 10 .or. ic == 13) then ! LF or CR ! if (num > 0) then ! at end of line of characters if (end_char /= ic) write (,1001) 'End of line character set to <',ic,'>' buffer = this_line nc = num end_char = ic ! remember normal termination return ! else if (ic == end_char) then ! repeated end of line character buffer = ' ' nc = 0 return

    ! try to cope with CR LF at start of line - is this a blank line else if (last_char /= 0) then ! not first write (,1001) '<',ic,'> second character to identify blank line' buffer = ' ' nc = 0 return ! else ! ignore leading other character for this line !z write (,1001) '<',ic,'> leading character ignored' last_char = ic end if !

26 Feb 2010 12:53 #6049

The rest of the updated test program:

! ctd..      subroutine get_next_line (unit, buffer, nc)
!
!         detect Tab (replace with 3 spaces)
            else if (ic == 9) then  ! Horizontal Tab
               do i = 1,3
                  if (num < len(this_line) ) then
                     c   = ' '
                     num = num+1
                     this_line(num:num) = c
                     last_char = ichar (c)
                  else
                     write (*,*) 'Buffer_overflow'
                     exit
                  end if
               end do
!
!         Other non-printable character - replace with a ~ ( could be other valid characters ??)
            else
               write (*,1001) '<',ic,'> non-printable character ignored at ',num
               if (num < len(this_line) ) then
                  c   = '~'
                  num = num+1
                  this_line(num:num) = c
                  last_char = ic
               else
                  write (*,*) 'Buffer_overflow'
               end if
            end if 
         end if 
      end do 
!
1001  format (a,i2.2,a,i0)
      end subroutine get_next_line 

!     Last change:  JDC  26 Feb 2010    6:01 pm
      integer*4 iostat, nc, nl, nb, mc
      character buffer*132, file_name*132
!
      call get_command_argument (1, file_name, nc, iostat)
      if (iostat /= 0) then
         write (*,*) 'No file provided'
         stop
      end if
!
      open (unit=11, file=file_name, status='old', access='transparent', form='unformatted', iostat=iostat)
      if (iostat /= 0) then
         write (*,*) 'Unable to open ', trim(file_name)
         stop
      else
         write (*,*) 'Scanning File ', trim(file_name)
      end if
!
      open (unit=98,file='get_line.log')
      write (98,*) 'Scanning File ', trim(file_name)
      nl = 0
      nb = 0
      mc = 0
      do
         call get_next_line (11, buffer, nc)
          if (nc < 0) exit
         write (98,'(a)') trim (buffer)
         nl = nl+1
         if (nc < 1) nb = nb+1
         if (nc > mc) mc = nc
      end do
      write (*,*) nl,' lines recovered'
      write (*,*) nb,' blank lines'
      write (*,*) mc,' max line length'
      end

      subroutine get_next_character (unit, c, iostat) 
! 
      integer*4 unit, iostat
      character c, MESSAGE*128
      INTEGER*2 ERROR_NUMBER
! 
      read (unit=unit, iostat=iostat) c 
!
      if (iostat /= 0) then
         error_number = iostat
         call FORTRAN_ERROR_MESSAGE@ (error_number, MESSAGE)
         write (*,*) 'IOSTAT =',iostat, ' ', TRIM (MESSAGE)
      end if
      end subroutine get_next_character

It has had some and may need more changes!!, but does show most of the anticipated problems solved.

26 Feb 2010 7:19 #6050

Thanks guys! 😄

I hadn't realised you'd written so much code yourself, I thought 'GET_NEXT_LINE' was a system routine (in much the same way as TRAP_EXCEPTION@ works!)

K

Please login to reply.