|
forums.silverfrost.com Welcome to the Silverfrost forums
|
View previous topic :: View next topic |
Author |
Message |
KennyT
Joined: 02 Aug 2005 Posts: 317
|
Posted: Thu Feb 25, 2010 9:32 am Post subject: Reading MAC format ASCII files? |
|
|
A normal PC format ASCII file uses CRLF (ASCII 13,10) as a line separator. Unix uses LF (ASCII 10). Both of these formats read quite happily using standard IO calls (if opened with CARRIAGECONTROL=LIST).
However, occasionally, we recieve files that are separated by CR (ASCII 13) alone. These don't read in. The CRs are stripped out and the entire file gets read in by a single READ(LUN,'(A)') call, which then causes a crash because of a buffer overwrite (I think!).
Browsing the web seems to indicate that the files may have originally come from a Mac? I can convert them using Wordpad (which replaces CR with CRLF) but I'd rather my program didn't crash on reading the original file!
So, to cut to the chase, is there a setting on opening a FORMATTED ASCII file that will accept CR, CRLF or LF as a line separator?
TIA
K |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2554 Location: Sydney
|
Posted: Thu Feb 25, 2010 1:50 pm Post subject: |
|
|
You could use OPEN (... ACCESS='TRANSPARENT'...) then write your own routine for GET_NEXT_LINE that reads one character at a time to cope with all mixes of <CR> and <LF>.
I have had to do this in the past to also cope with any other non-printable character that can be found in files from other devices. |
|
Back to top |
|
|
KennyT
Joined: 02 Aug 2005 Posts: 317
|
Posted: Thu Feb 25, 2010 3:34 pm Post subject: |
|
|
Thanks, John,
I was kinda hoping for something that wouldn't need me to rewrite large chunks of code!
Perhaps I'll use your technique to at least flag that there is a problem and recommend the "Wordpad" solution.
But if anyone's got a neater solution, don't be shy!
Tks
K |
|
Back to top |
|
|
JohnHorspool
Joined: 26 Sep 2005 Posts: 270 Location: Gloucestershire UK
|
Posted: Thu Feb 25, 2010 5:00 pm Post subject: |
|
|
Kenny,
Not necessary to rewrite large chunks of code.
You just need to write one routine only, which first checks whether you have a unix style text file or not and then converts it into a DOS type format (using John's method) if found to be a unix format.
Subsequently all your existing code will read it okay. I have also had to do this myself.
regards,
John |
|
Back to top |
|
|
KennyT
Joined: 02 Aug 2005 Posts: 317
|
Posted: Thu Feb 25, 2010 5:14 pm Post subject: |
|
|
OK, is there a guide to what to do? I can't see anything in the manual.
Tks
K |
|
Back to top |
|
|
JohnHorspool
Joined: 26 Sep 2005 Posts: 270 Location: Gloucestershire UK
|
Posted: Fri Feb 26, 2010 12:18 am Post subject: |
|
|
Kenny,
John Campbell has given the basics of how to do this.
As an alternative to OPEN with ACCESS='TRANSPARENT' you could use OPENRW@ with READF@ to read one byte at a time.
Assuming you have scanned a file to find that it is in unix format, then perhaps something like this:-
1. rename the file using CISSUE or START_PROCESS@
2. open the file (after renaming) with OPENRW@
3. open a new file using the original file name with OPEN
4. read the original file one byte (one text character) at a time, store all characters that are not CR 13 in a string that you increment in length with each character.
5. when you hit a CR13 write the string contents as a single string of characters to the new file, discard the CR13, set the length of the string back to zero and repeat step 4)
6. when you get to the end of the file just write the string contents to the new file.
7. close the original file and delete it.
8. close the new file which has the same original name as the old file and is now in DOS ascii text format.
that's it !
cheers,
John |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2554 Location: Sydney
|
Posted: Fri Feb 26, 2010 12:22 am Post subject: |
|
|
The following "RE-"tested code should go part of the way. You may have to improve it to cope with:
- other non-printable characters, such as TAB, should not be ignored (see ASCII in wikipedia)
- multiple CR CR or LF LF may imply valid multiple blank lines
The concept should be able to be improved.
As John H suggested, you could use this at the start of the program to clean out the file before using generally, or simply replace the read statement(s) by the subroutine call.
Code: | subroutine get_next_line (unit, buffer, nc)
!
! routine to read the file and get the next valid line
! Possible Line termination formats are
! CRLF
! CR
! LF
!
integer*4 unit ! file unit number
character buffer*(*) ! character string
integer*4 nc ! number of characters found; 0 = blank line, -1 = end of file
!
integer*4 num ! number of characters recovered
integer*4 last_char ! last character recovered
integer*4 end_char ! character to end last line
character this_line*132 ! line to store characters
character c*1
integer*4 ic, i
integer*4 iostat
!
data last_char / -1 / ! -1 indicates file has not been opened
data end_char / -1 / ! last character to terminate a line
!
! Find the next line
num = 0
this_line = ' '
last_char = 0
do
call get_next_character (unit, c, iostat)
!
! Termination of file
if (iostat /= 0) then ! end of file
if (num > 0) then
buffer = this_line
nc = num
return
else
nc = -abs (iostat)
return
end if
else
!
ic = mod (ichar (c),128)
!
! save normal character to line buffer
if (ic >= 32) then ! normal character
if (num < len(this_line) ) then
num = num + 1
this_line(num:num) = char (ic)
else
write (*,*) 'Buffer_overflow'
end if
last_char = ic
!
! detect LF or CR at end of line
else if (ic == 10 .or. ic == 13) then ! LF or CR
!
if (num > 0) then ! at end of line of characters
if (end_char /= ic) write (*,1001) 'End of line character set to <',ic,'>'
buffer = this_line
nc = num
end_char = ic ! remember normal termination
return
!
else if (ic == end_char) then ! repeated end of line character
buffer = ' '
nc = 0
return
! try to cope with CR LF at start of line - is this a blank line
else if (last_char /= 0) then ! not first
write (*,1001) '<',ic,'> second character to identify blank line'
buffer = ' '
nc = 0
return
!
else ! ignore leading other character for this line
!z write (*,1001) '<',ic,'> leading character ignored'
last_char = ic
end if
!
|
Last edited by JohnCampbell on Fri Feb 26, 2010 8:05 am; edited 3 times in total |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2554 Location: Sydney
|
Posted: Fri Feb 26, 2010 1:53 am Post subject: |
|
|
The rest of the updated test program:
Code: | ! ctd.. subroutine get_next_line (unit, buffer, nc)
!
! detect Tab (replace with 3 spaces)
else if (ic == 9) then ! Horizontal Tab
do i = 1,3
if (num < len(this_line) ) then
c = ' '
num = num+1
this_line(num:num) = c
last_char = ichar (c)
else
write (*,*) 'Buffer_overflow'
exit
end if
end do
!
! Other non-printable character - replace with a ~ ( could be other valid characters ??)
else
write (*,1001) '<',ic,'> non-printable character ignored at ',num
if (num < len(this_line) ) then
c = '~'
num = num+1
this_line(num:num) = c
last_char = ic
else
write (*,*) 'Buffer_overflow'
end if
end if
end if
end do
!
1001 format (a,i2.2,a,i0)
end subroutine get_next_line
! Last change: JDC 26 Feb 2010 6:01 pm
integer*4 iostat, nc, nl, nb, mc
character buffer*132, file_name*132
!
call get_command_argument (1, file_name, nc, iostat)
if (iostat /= 0) then
write (*,*) 'No file provided'
stop
end if
!
open (unit=11, file=file_name, status='old', access='transparent', form='unformatted', iostat=iostat)
if (iostat /= 0) then
write (*,*) 'Unable to open ', trim(file_name)
stop
else
write (*,*) 'Scanning File ', trim(file_name)
end if
!
open (unit=98,file='get_line.log')
write (98,*) 'Scanning File ', trim(file_name)
nl = 0
nb = 0
mc = 0
do
call get_next_line (11, buffer, nc)
if (nc < 0) exit
write (98,'(a)') trim (buffer)
nl = nl+1
if (nc < 1) nb = nb+1
if (nc > mc) mc = nc
end do
write (*,*) nl,' lines recovered'
write (*,*) nb,' blank lines'
write (*,*) mc,' max line length'
end
subroutine get_next_character (unit, c, iostat)
!
integer*4 unit, iostat
character c, MESSAGE*128
INTEGER*2 ERROR_NUMBER
!
read (unit=unit, iostat=iostat) c
!
if (iostat /= 0) then
error_number = iostat
call FORTRAN_ERROR_MESSAGE@ (error_number, MESSAGE)
write (*,*) 'IOSTAT =',iostat, ' ', TRIM (MESSAGE)
end if
end subroutine get_next_character
|
It has had some and may need more changes!!, but does show most of the anticipated problems solved. |
|
Back to top |
|
|
KennyT
Joined: 02 Aug 2005 Posts: 317
|
Posted: Fri Feb 26, 2010 8:19 am Post subject: |
|
|
Thanks guys!
I hadn't realised you'd written so much code yourself, I thought "GET_NEXT_LINE" was a system routine (in much the same way as TRAP_EXCEPTION@ works!)
K |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|