Silverfrost Forums

Welcome to our forums

Maximum length of a character variable?

29 May 2012 8:02 #10225

Hi all, here an odd situation. There are files containing lines of very large length, up to 1 to 3 million characters until EOL. The format dates back to times with punched tapes, but did survive. When you edit such a file with notepad++, the length of the line is displayed - and it is a large number. I had no luck in reading the long line using this piece of code. It works for much smaller numbers but not for 2 millions. [color=green:cdd322eaa4] INTEGER :: nlen=2000000,iline=0 CHARACTER(LEN=nlen) :: cline DO iline=iline+1 READ (logunit,'(A)',iostat=istatus) cline print *,'iline,istatus=',iline,istatus,TRIM(cline) if (istatus/=0) EXIT ENDDO[/color:cdd322eaa4]

Question: is there a limit on the length of character variables? Who has any idea to split a long record into pieces? A test file could be provided.

29 May 2012 11:39 #10227

I would recommend using 'character1 contents(big_number)' rather than 'characterbig_number contents'. I do not expect that there is a problem with a maximum line length. I changed a program that reads any file to also store the contents. Hopefully this will work. Let me know how it goes.

!
      CHARACTER file_name*128, get_file_name*128, c, c_typ*4
      INTEGER*4 ic, iostat, jc
      integer*8 count_ic(0:256), n, nc, na, nz, nl
      integer*8 :: one = 1
      EXTERNAL get_file_name
!
      integer*4, parameter :: million       = 1000000
      integer*4, parameter :: max_file_size = 50*million
      integer*4 num_line, next_char, line_len, max_len 
      integer*4 line_index(million)
      character line_content(max_file_size)
!
      character ch_name(0:32)*34
      data  ch_name(  0) / ' 00 NUL  Null character ' /
      data  ch_name(  1) / ' 01 SOH  Start of Header ' /
      data  ch_name(  2) / ' 02 STX  Start of Text ' /
      data  ch_name(  3) / ' 03 ETX  End of Text ' /
      data  ch_name(  4) / ' 04 EOT  End of Transmission ' /
      data  ch_name(  5) / ' 05 ENQ  Enquiry ' /
      data  ch_name(  6) / ' 06 ACK  Acknowledgment ' /
      data  ch_name(  7) / ' 07 BEL  Bell ' /
      data  ch_name(  8) / ' 08 BS   Backspace ' /
      data  ch_name(  9) / ' 09 HT   Horizontal Tab ' /
      data  ch_name( 10) / ' 0A LF   Line feed ' /
      data  ch_name( 11) / ' 0B VT   Vertical Tab ' / 
      data  ch_name( 12) / ' 0C FF   Form feed ' /
      data  ch_name( 13) / ' 0D CR   Carriage return ' /
      data  ch_name( 14) / ' 0E SO   Shift Out ' /
      data  ch_name( 15) / ' 0F SI   Shift In ' /
      data  ch_name( 16) / ' 10 DLE  Data Link Escape ' /
      data  ch_name( 17) / ' 11 DC1  Device Control 1 (XON) ' /
      data  ch_name( 18) / ' 12 DC2  Device Control 2 ' /
      data  ch_name( 19) / ' 13 DC3  Device Control 3 (XOFF) ' /
      data  ch_name( 20) / ' 14 DC4  Device Control 4 ' /
      data  ch_name( 21) / ' 15 NAK  Negative Acknowledgement ' /
      data  ch_name( 22) / ' 16 SYN  Synchronous idle ' /
      data  ch_name( 23) / ' 17 ETB  End of Transmission Block' /
      data  ch_name( 24) / ' 18 CAN  Cancel ' /
      data  ch_name( 25) / ' 19 EM   End of Medium ' /
      data  ch_name( 26) / ' 1A SUB  Substitute ' /
      data  ch_name( 27) / ' 1B ESC  Escape ' /
      data  ch_name( 28) / ' 1C FS   File Separator ' / 
      data  ch_name( 29) / ' 1D GS   Group Separator ' /
      data  ch_name( 30) / ' 1E RS   Record Separator ' /
      data  ch_name( 31) / ' 1F US   Unit Separator ' /
      data  ch_name( 32) / ' 7F DEL  Delete ' /
!
!  Open the text file as transparent
!
      file_name = get_file_name ()
!
      open (unit   = 11,               &
            file   = file_name,        &
            status = 'OLD',            &
            form   = 'UNFORMATTED',    &
            access = 'TRANSPARENT',    &
            iostat = iostat)
      write (*,2000) 'Opening file ',trim(file_name),' iostat = ',iostat
2000  format (a,a,a,i0)
!
!  Read all characters and lines
!
      count_ic = 0
      n        = 0
      nl       = 0
      nc       = 0
      na       = 0
      nz       = 0
!
      next_char = 1
      num_line  = 1
      line_len  = 0
      max_len   = 0
      line_index(num_line) = next_char
!
      do
         read (unit=11, iostat=iostat) c
         if (iostat /= 0) exit
!
         n  = n + one
         ic = ichar(c)
         count_ic(ic) = count_ic(ic)+one
!
29 May 2012 11:41 (Edited: 29 May 2012 11:44) #10228
!
         select case (ic)
            case (10)
               nl = nl + one   ! line control characters ; 10:LF indicates end of line
               num_line = num_line + 1
               if (num_line < million) line_index(num_line) = next_char
               max_len  = max (max_len, line_len)
               line_len = 0
            case (13)
               nl = nl + one   ! line control characters ; 10:LF indicates end of line
            case (0:9,11,12,14:31)
               nc = nc + one   ! control characters
            case (32:126)
               na = na + one   ! normal character
               line_content(next_char) = c
               if (next_char < max_file_size) next_char = next_char + 1
               line_len  = line_len  + 1
            case (127:256)
               nz = nz + one   ! other character
            case default
               write (*,*) 'Unrecognised character : ichar =',ic
         end select
      end do
!
      write (*,2001) trim(file_name)
2001  format (/'Count of active characters in file ',a// 'Ichar    C    Char_Count')
2002  format (i5,a5,b'zz,zzz,zzz,zz#',2x,a)       
      do ic = 256,0,-1
         if (count_ic(ic) < one) cycle
         if (ic >= 32 .AND. ic < 128) then
            write (*,2002) ic, char(ic), count_ic(ic)
         else if (ic < 32 .or. ic == 128) then
            jc = min (ic,32)
            c_typ = ch_name(jc)(4:7)
            write (*,2002) ic, c_typ, count_ic(ic), trim (ch_name(jc)(10:))
         else
            c_typ = '???'
            if (ic==10) c_typ = '<LF>'
            if (ic==13) c_typ = '<CR>'
            write (*,2002) ic, c_typ, count_ic(ic)
         end if
      end do
!
      write (*,*) ' '
      if (iostat == -1) then
         write (*,2003) n,' characters read from file ',trim(file_name)
      else
         write (*,2004) 'end of file after',n,' characters : IOSTAT = ',iostat
      end if
2003  format (b'zz,zzz,zzz,zz#',a,a)       
2004  format (a,b'zz,zzz,zzz,zz#',a,i0)       
      write (*,2003) na, ' normal text characters 32-126'
      write (*,2003) nl, ' line control characters 10,13'
      write (*,2003) nc, ' other control characters 0-31'
      write (*,2003) nz, ' non-printable characters 127-255'
      nl = max (count_ic(10), count_ic(13))
      write (*,2003) nl, ' lines identified if text file'
!
      if (count_ic(10) == nl .and. count_ic(13) == 0) then
         write (*,2000) ' NOTE : No <CR> in file : not typical DOS text file'
      end if
!
      if ( count_ic(10) /= count_ic(13)) then
         if ( count_ic(10) == 0) then
            write (*,2000) '        Only <CR> used for end of line'
         else if ( count_ic(13) == 0) then
            write (*,2000) '        Only <LF> used for end of line'
         else
            write (*,2000) '        Inconsistent <LF> and <CR> count found'
         end if
      end if
!
      write (*,*) ' '
      write (*,2003) num_line-1,  ' lines stored'
      write (*,2003) max_len,     ' maximum line length identified'
      write (*,2003) next_char-1, ' text characters identified in file'
!
      end
29 May 2012 11:43 #10229

We realy need a better way of attaching code examples !

      character*128 function get_file_name ()
      character*132 fname
      integer*4 i,n
!
      call command_line (fname)
!
      n = 0
      do i = 1,len(fname)
         if (fname(i:i) /= ''') then
            n = n+1
            if (n==i) cycle
            fname(n:n) = fname(i:i)
            fname(i:i) = ' '
         else
            fname(i:i) = ' '
         end if
      end do
!
      get_file_name = fname
      end function get_file_name

You might want to change the code to include control characters and ic>=127 in the line if that suits. I think the important issues are use a character*1 array and open the file for access = 'TRANSPARENT'

John

29 May 2012 2:35 #10230

Hello John, thank you for the brilliant idea. Unformatted reading is the key, Transparent seems not to be necessary. This is my final piece of code in a subroutine. Regards, Johannes

SUBROUTINE READ_longline(lugerb,cline,n,istatus)
! read very long lines character by character  unformatted
INTEGER, INTENT(IN) :: lugerb
CHARACTER(LEN=*), INTENT(OUT) :: cline		! line in a long character variable
INTEGER, INTENT(OUT) :: n					! number of good characters in cline

INTEGER :: iline=0,istatus,ic,ncmax
CHARACTER :: c								! aux character
SAVE iline

! Reading gerbfile quasi line by line as Unformatted character by character
iline=iline+1
ncmax=LEN(cline)
cline=' '
n=0 								! number of characters in line
do
  read (unit=lugerb, iostat=istatus) c			! does not need a counter
  if (istatus /= 0) THEN			! last line does not end with LF
    !print *,cline(1:n)
    EXIT
  endif
  n=n+1			       				! n'th character read successfully
  ic = ichar(c)
  if (ic>=32.and.ic<=126) THEN		! normal character
    IF (n<ncmax) THEN
      cline(n:n)=c
    ELSE
      print *,'READ_longline: number of characters in cline exceeded'
      istatus=-333
      RETURN
    ENDIF
  else if (ic==10) THEN    			! line control characters ; 10:LF indicates end of line
    n=n-1
    !print *,cline(1:n)  			! don't use the LF Character itself
    EXIT 
  else
    print *,'READ_longline: unrecognized character:',c,' in line',iline		! should not happen
    EXIT    
  endif
enddo
            
RETURN
END
29 May 2012 10:33 #10233

I'm surprised that you do not need TRANSPARENT. I've always assumed you would. To check, report the number of characters read from the file and compare this to the file size. John

30 May 2012 7:56 #10236

Hi John, you are right, access=transparent is the only correct way:

OPEN (unit=lugerb,file=TRIM(file),form='unformatted',access='transparent',iostat=ierror) .

BTW for later readers: The Subject 'maximum length of character variables?' was not correct. Better: 'maximum length of formatted records?' This seems to be around some 10.000

Regards johannes

Please login to reply.