forums.silverfrost.com Forum Index forums.silverfrost.com
Welcome to the Silverfrost forums
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Reading MAC format ASCII files?

 
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> Support
View previous topic :: View next topic  
Author Message
KennyT



Joined: 02 Aug 2005
Posts: 317

PostPosted: Thu Feb 25, 2010 9:32 am    Post subject: Reading MAC format ASCII files? Reply with quote

A normal PC format ASCII file uses CRLF (ASCII 13,10) as a line separator. Unix uses LF (ASCII 10). Both of these formats read quite happily using standard IO calls (if opened with CARRIAGECONTROL=LIST).

However, occasionally, we recieve files that are separated by CR (ASCII 13) alone. These don't read in. The CRs are stripped out and the entire file gets read in by a single READ(LUN,'(A)') call, which then causes a crash because of a buffer overwrite (I think!).

Browsing the web seems to indicate that the files may have originally come from a Mac? I can convert them using Wordpad (which replaces CR with CRLF) but I'd rather my program didn't crash on reading the original file!

So, to cut to the chase, is there a setting on opening a FORMATTED ASCII file that will accept CR, CRLF or LF as a line separator?

TIA

K
Back to top
View user's profile Send private message Visit poster's website
JohnCampbell



Joined: 16 Feb 2006
Posts: 2554
Location: Sydney

PostPosted: Thu Feb 25, 2010 1:50 pm    Post subject: Reply with quote

You could use OPEN (... ACCESS='TRANSPARENT'...) then write your own routine for GET_NEXT_LINE that reads one character at a time to cope with all mixes of <CR> and <LF>.
I have had to do this in the past to also cope with any other non-printable character that can be found in files from other devices.
Back to top
View user's profile Send private message
KennyT



Joined: 02 Aug 2005
Posts: 317

PostPosted: Thu Feb 25, 2010 3:34 pm    Post subject: Reply with quote

Thanks, John,

I was kinda hoping for something that wouldn't need me to rewrite large chunks of code!

Perhaps I'll use your technique to at least flag that there is a problem and recommend the "Wordpad" solution.

But if anyone's got a neater solution, don't be shy!

Tks

K
Back to top
View user's profile Send private message Visit poster's website
JohnHorspool



Joined: 26 Sep 2005
Posts: 270
Location: Gloucestershire UK

PostPosted: Thu Feb 25, 2010 5:00 pm    Post subject: Reply with quote

Kenny,

Not necessary to rewrite large chunks of code.

You just need to write one routine only, which first checks whether you have a unix style text file or not and then converts it into a DOS type format (using John's method) if found to be a unix format.

Subsequently all your existing code will read it okay. I have also had to do this myself.

regards,
John
Back to top
View user's profile Send private message Visit poster's website
KennyT



Joined: 02 Aug 2005
Posts: 317

PostPosted: Thu Feb 25, 2010 5:14 pm    Post subject: Reply with quote

OK, is there a guide to what to do? I can't see anything in the manual.

Tks

K
Back to top
View user's profile Send private message Visit poster's website
JohnHorspool



Joined: 26 Sep 2005
Posts: 270
Location: Gloucestershire UK

PostPosted: Fri Feb 26, 2010 12:18 am    Post subject: Reply with quote

Kenny,

John Campbell has given the basics of how to do this.

As an alternative to OPEN with ACCESS='TRANSPARENT' you could use OPENRW@ with READF@ to read one byte at a time.

Assuming you have scanned a file to find that it is in unix format, then perhaps something like this:-

1. rename the file using CISSUE or START_PROCESS@

2. open the file (after renaming) with OPENRW@

3. open a new file using the original file name with OPEN

4. read the original file one byte (one text character) at a time, store all characters that are not CR 13 in a string that you increment in length with each character.

5. when you hit a CR13 write the string contents as a single string of characters to the new file, discard the CR13, set the length of the string back to zero and repeat step 4)

6. when you get to the end of the file just write the string contents to the new file.

7. close the original file and delete it.

8. close the new file which has the same original name as the old file and is now in DOS ascii text format.

that's it !

cheers,
John
Back to top
View user's profile Send private message Visit poster's website
JohnCampbell



Joined: 16 Feb 2006
Posts: 2554
Location: Sydney

PostPosted: Fri Feb 26, 2010 12:22 am    Post subject: Reply with quote

The following "RE-"tested code should go part of the way. You may have to improve it to cope with:
- other non-printable characters, such as TAB, should not be ignored (see ASCII in wikipedia)
- multiple CR CR or LF LF may imply valid multiple blank lines
The concept should be able to be improved.
As John H suggested, you could use this at the start of the program to clean out the file before using generally, or simply replace the read statement(s) by the subroutine call.

Code:
      subroutine get_next_line (unit, buffer, nc)
!
!  routine to read the file and get the next valid line
!   Possible Line termination formats are
!     CRLF
!     CR
!     LF
!
      integer*4 unit             ! file unit number
      character buffer*(*)       ! character string
      integer*4 nc               ! number of characters found; 0 = blank line, -1 = end of file
!
      integer*4 num              ! number of characters recovered
      integer*4 last_char        ! last character recovered
      integer*4 end_char         ! character to end last line
      character this_line*132    ! line to store characters
      character c*1
      integer*4 ic, i
      integer*4 iostat
!
      data last_char / -1 /      ! -1 indicates file has not been opened
      data end_char  / -1 /      ! last character to terminate a line
!
!   Find the next line
      num       = 0
      this_line = ' '
      last_char = 0
      do
         call get_next_character (unit, c, iostat)
!
!       Termination of file
         if (iostat /= 0) then   ! end of file
            if (num > 0) then
               buffer = this_line
               nc     = num
               return
            else
               nc     = -abs (iostat)
               return
            end if
         else
!
            ic = mod (ichar (c),128)
!
!         save normal character to line buffer
            if (ic >= 32) then   ! normal character
               if (num < len(this_line) ) then
                  num = num + 1
                  this_line(num:num) = char (ic)
               else
                  write (*,*) 'Buffer_overflow'
               end if
               last_char = ic
!
!         detect LF or CR at end of line
            else if (ic == 10 .or. ic == 13) then  ! LF or CR
!
               if (num > 0) then                   ! at end of line of characters
                  if (end_char /= ic) write (*,1001) 'End of line character set to <',ic,'>'
                  buffer    = this_line
                  nc        = num
                  end_char  = ic                   ! remember normal termination
                  return
!
               else if (ic == end_char) then       ! repeated end of line character
                  buffer = ' '
                  nc     = 0
                  return

!            try to cope with CR LF at start of line - is this a blank line
               else if (last_char /= 0) then       ! not first
                  write (*,1001) '<',ic,'> second character to identify blank line'
                  buffer = ' '
                  nc     = 0
                  return
!
               else                                ! ignore leading other character for this line
!z                  write (*,1001) '<',ic,'> leading character ignored'
                  last_char = ic
               end if
!


Last edited by JohnCampbell on Fri Feb 26, 2010 8:05 am; edited 3 times in total
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 2554
Location: Sydney

PostPosted: Fri Feb 26, 2010 1:53 am    Post subject: Reply with quote

The rest of the updated test program:
Code:
! ctd..      subroutine get_next_line (unit, buffer, nc)
!
!         detect Tab (replace with 3 spaces)
            else if (ic == 9) then  ! Horizontal Tab
               do i = 1,3
                  if (num < len(this_line) ) then
                     c   = ' '
                     num = num+1
                     this_line(num:num) = c
                     last_char = ichar (c)
                  else
                     write (*,*) 'Buffer_overflow'
                     exit
                  end if
               end do
!
!         Other non-printable character - replace with a ~ ( could be other valid characters ??)
            else
               write (*,1001) '<',ic,'> non-printable character ignored at ',num
               if (num < len(this_line) ) then
                  c   = '~'
                  num = num+1
                  this_line(num:num) = c
                  last_char = ic
               else
                  write (*,*) 'Buffer_overflow'
               end if
            end if
         end if
      end do
!
1001  format (a,i2.2,a,i0)
      end subroutine get_next_line

!     Last change:  JDC  26 Feb 2010    6:01 pm
      integer*4 iostat, nc, nl, nb, mc
      character buffer*132, file_name*132
!
      call get_command_argument (1, file_name, nc, iostat)
      if (iostat /= 0) then
         write (*,*) 'No file provided'
         stop
      end if
!
      open (unit=11, file=file_name, status='old', access='transparent', form='unformatted', iostat=iostat)
      if (iostat /= 0) then
         write (*,*) 'Unable to open ', trim(file_name)
         stop
      else
         write (*,*) 'Scanning File ', trim(file_name)
      end if
!
      open (unit=98,file='get_line.log')
      write (98,*) 'Scanning File ', trim(file_name)
      nl = 0
      nb = 0
      mc = 0
      do
         call get_next_line (11, buffer, nc)
          if (nc < 0) exit
         write (98,'(a)') trim (buffer)
         nl = nl+1
         if (nc < 1) nb = nb+1
         if (nc > mc) mc = nc
      end do
      write (*,*) nl,' lines recovered'
      write (*,*) nb,' blank lines'
      write (*,*) mc,' max line length'
      end

      subroutine get_next_character (unit, c, iostat)
!
      integer*4 unit, iostat
      character c, MESSAGE*128
      INTEGER*2 ERROR_NUMBER
!
      read (unit=unit, iostat=iostat) c
!
      if (iostat /= 0) then
         error_number = iostat
         call FORTRAN_ERROR_MESSAGE@ (error_number, MESSAGE)
         write (*,*) 'IOSTAT =',iostat, ' ', TRIM (MESSAGE)
      end if
      end subroutine get_next_character


It has had some and may need more changes!!, but does show most of the anticipated problems solved.
Back to top
View user's profile Send private message
KennyT



Joined: 02 Aug 2005
Posts: 317

PostPosted: Fri Feb 26, 2010 8:19 am    Post subject: Reply with quote

Thanks guys! Very Happy

I hadn't realised you'd written so much code yourself, I thought "GET_NEXT_LINE" was a system routine (in much the same way as TRAP_EXCEPTION@ works!)

K
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> Support All times are GMT + 1 Hour
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group