Silverfrost Forums

Welcome to our forums

Reading and writing Tabs

29 May 2010 8:56 (Edited: 29 May 2010 9:03) #6453

When you read the character text which has tabs in it and then write it back tabs are converted into spaces. How to preserve tabs in the output? Example is here: if file a.ini has tabs they disappear in b.out

   CHARACTER*128 text

   open(111,file='a.ini')
   open(112,file='b.out')

   do i=1,1000
     read (111,'(a)',end=1000) TEXT
     write(112,'(a)') TRIM(TEXT)
   enddo
1000 close(111)
     close(112)
   end    	
30 May 2010 12:06 #6455

Dan,

The problem is with reading tabs. You can control the reading of tabs, to not get spaces. See OPEN

Reading tab characters In a Fortran READ statement, by default, tab characters read from a file are converted to spaces. To avoid this conversion you should make a call to the subroutine READ_TABS@(unitno) immediately after the OPEN statement (unitno is the unit number of the stream).

I think that when writing tabs, no conversion is applied. John

30 May 2010 5:13 #6456

Holly@#$...i did not that.... this compiler library can do literally anything.

Thank John. I was losing the whole day stopped by this problem with no good solution in mind and already was thinking to ask misc.lang.fortran ... and lose more time...because it looks like there is no general portable solution in Fortran if compiler-specific routine with @ was invented for this purpose

31 May 2010 11:49 #6462

Well...Got new problem with the tabs.

When i try to find the position of TAB in the text string using INDEX, i get wrong result.

To reproduce this, one can use same demo code as above, modified a bit to include read_tabs@. Code just reads and writes line of text from one file into another. And is doing that so that both a.ini and b.out files are identical.

Wrong becomes the text manipulation with the strings which have tabs. If the file a.ini consists of arbitrary text and one or several tabs, the INDEX, which must bring us position of the first tab iPosOfTab, produces something else which depends of the entire text of the line. Is this 'gray area' of the Fortran standard or just the bug?

From the other hand, if we remove read_tabs@ (and lose the ability to write exactly the same line back to the file b.out) the INDEX works fine, it finds position of the TAB correctly. The miracle is that TAB is destroyed being substituted with spaces but actually it is not until the text is written to the file. Very hidden mechanics...

   CHARACTER*128 text 

   open(111,file='a.ini') 
   call READ_TABS@(111) 
   open(112,file='b.out') 

     read (111,'(a)',end=1000) TEXT 
     write(112,'(a)') TRIM(TEXT) 

1000 close(111) 
     close(112) 

     iPosOfTab = index(text,'	')
     print*, iPosOfTab

   end  
1 Jun 2010 5:02 #6464

Dan,

I would question your 'wrong' assumption.

Try adding the following code to check what is in the line:

character c, tab
integer i,ic
do i = 1,len_trim(text)
  c = text(i:i)
  ic = ichar (c)
end do

Make sure the parity bit is not set, as this can happen with some files. ( tab = char(9+128)) Then you have to check your value of the sub string you are using. Potentially you should be able to find a tab in the text by using:

     tab = char (9) ! note tab is a character variable
     ipos = index (text,tab)

This should work !!

John

1 Jun 2010 7:02 #6473

OK, in summary, it looks like this damn problem was in the way the Tab was defined. Seems we can not use the Tab on the keyboard like in my example above

     iPosOfTab = index(text,'	') ! here is Tab from keyboard

This does not work by some not yet clear to me reason. The way it works is to define Tab via char as you have suggested

     iPosOfTab = index(text,char(9))

THANKS John

2 Jun 2010 12:09 #6476

I think that your code iPosOfTab = index(text,' ') ! here is Tab from keyboard would suffer from the interpretation your IDE/editer placed on the tab key when pressed. You could set this text string to a variable, say character*10, and print out the values. I do sometines write out numerical results in a tab delimited format for excel, again using char(9). It is unfortunate that there is such an unpredictable result from tabs, ( which is probably why .csv are more common/useable than .tsv file formats. How do Europeans cope with trying to use 123,2 instead of 123.2 ? ) John

2 Jun 2010 3:01 #6477

Tab in my example should not depend of its editor representation in the text. Tab is tab. The CHAR(9). One symbol. Period.

Should but i am not sure it always is. What did you mention about parity bit and how to set it differently?

2 Jun 2010 11:24 #6482

John, When I worked in Norway, the .csv files used comma as a decimal point, and semi-colon as the separator. For thousand separators, they use the decimal place. You would need to write a routine to translate these characters appropriately, taking into accout quoted strings.

For example:

Norway/Europe 3.005,32;'hello, Goodbye';250,33

Needs to be translated to: 3,005.32,'hello, Goodbye',250.33

and then handled like a British/American csv file.

character* 100 line_in
line_in='3.005,32;'hello, Goodbye';250,33'
call swap_euro_csv(line_in)
end
subroutine swap_euro_csv(line_in)
character*(*) line_in
character*200 line_out
character*3 swaps(2)
data swaps/',;.' , '.,z'/
logical in_quote
in_quote=.false.
iout_pos = 0
do i=1,length=leng(line_in)
  if(line_in(i:i) .eq. ''')then
    in_quote = .not.in_quote
  endif
  if(.not.in_quote)then
    iswap = index(swaps(1),line_in(i:i))
  else
    iswap = 0
  endif
  if(iswap .le. 2)then
c
c we don't want to swap a Euro dot for a comma as this will confuse the delimiter in Brit mode
    iout = iout + 1
    if(iswap .eq. 0)then
      line_out(iout:iout) = line_in(i:i)
    else
      line_out(iout:iout) = swaps(2)(iswap:iswap)
    endif
  endif
enddo
line_in = line_out(:iout)
end

Hilsen og farvel Ian

2 Jun 2010 1:36 #6484

Dan,

What did you mention about parity bit and how to set it differently?

Other O/S set the 8th bit (was called parity bit) for characters, so their numeric value was in the range 129-255. I think you can still get this with files from some other O/S, but probably not the problem here. These are special characters in windows and DOS. John

2 Jun 2010 4:26 #6485

Parity bits were really for transmission of ASCII data in the olden days, when I was a lad. Terminals in those days only used characters from 0 to 127, the latter being the delete character or back arrow when printed. Since the advent of the PC, the full 255 charactres are now defined, and this leaves no room for using the parity bit as an error warning/correction method for transmision of data. Modern systems use a packet switching system, with cyclic redundancy checks (by modern I mean for the last 25+ years) and no use of individual character parity checking. It should not be a problem.

You need to use the char(n) method of defining any character in the ASCII character set below the value of 32 (20hex = space), and these are termed 'non-printing' characters or control characters. A few useful characters from memory shown as the decimal/hex character number and the keyboard press originally used + the name are:

3/03h = ctrl+C = cancel (interrupts processing in DOS & Digital Equipment operating systems)
7/07h = ctrl+G = bell
8/08h = ctrl+H = backspace
9/09h = ctrl+I = tab
10/0Ah = ctrl+J = line feed
12/0Ch = ctrl+L = form feed - new page on printer.
13/0Dh = ctrl+M = Carriage Return or just Return or even Enter
17/11h = ctrl+Q = XON (transmission on , restarts computer sending to terminal/printer, used for flow control)
19/13h = ctrl+S = XOFF (transmission off , stops computer sending to terminal/printer, used for flow control)

'Even parity' meant that the parity bit was set to one to cause the total number of bits set to one in the character to be an even number. Similarly there is the less usual 'odd parity' which was specially designed to drive people nuts when logging on to that type of system, with the computer thinking every character was faulty and the reply to the terminal also being interpreted by the terminal that the character was faulty. I hate odd parity.

I hope this helps.

Ian

Please login to reply.