Silverfrost Forums

Welcome to our forums

IOSTAT = 52

19 Oct 2016 4:15 #18174

I was trying to open what I thought was a simple ASCII file and found that the program crashes. The value returned by IOSTAT was 52 instead of 0, which means invalid character in field. Apparently, the ASCII file that I received was an UTF-8, and once I opened it, I discovered that effectively, there was a strange character at the very first position in the first line of data. The question is: is FTN95 unable to read UTF-8 files?, only ANSI?, and if this were the case: what can I do in order to work with this type of files?

Agustin

19 Oct 2016 5:02 #18177

What do your open and read statements look like?

19 Oct 2016 5:26 #18178

[code] open(37,file=filename,status='old',action='read') ndata=0 do read(37,*,iostat=stat) x1,y1 if(stat/= 0) exit ndata=ndata+1 end do close(37)

19 Oct 2016 5:40 #18179

If you can post the file so it can be downloaded (i.e. share on Dropbox or GoogleDrive), I'd be happy to take a look at it in detail and report back, either here or directly to you. Just post the link here.

Bill

19 Oct 2016 5:50 #18180

I have uploaded it to Google Drive:

https://drive.google.com/drive/folders/0B3XXtxz8ICJHNDRYZDJoNzBfSms

can you get it or need to add your email address to share it?

Agustin

19 Oct 2016 6:17 #18181

You need to set the permissions to 'Anyone with a link'. If you need to restrict access, you can use (wahorger@gmail.com) to authorize. Either way is fine.

19 Oct 2016 6:23 #18182

done!.....I hope.....

https://drive.google.com/file/d/0B3XXtxz8ICJHbXloVU1pbm5mdzQ/view?usp=sharing

19 Oct 2016 6:47 #18183

The first 3 characters of the file are non-ASCII. EFBBBF is the UTF-8 byte order marking. The remainder of the file is standard ASCII with CR/LF marks between the data lines.

You could do this a number of ways, but let's assume you don't want to edit the file in any way.

  1. You could read the first line with a format (A3,A) so the first three characters would get sucked up and ignored. Then read the data portion using a list directed I/O but on the characters. Then, read the remainder of the file using the list directed I/O. This seems to be the easiest solution. I tested this code against your file and it correctly reported ndata=1999 and istat=-1 (EOF).

    character3 ignore_me character32 read_me open(37,file='utf-8-file.dat',status='old',action='read') ndata=0 read(37,'(a3,a)')ignore_me,read_me read(read_me,*)x1,y1 print ,'x1,y1=',x1,y1 ndata=1 do read(37,,iostat=istat) x1,y1 if(istat/= 0) exit ndata=ndata+1 end do print *,istat,ndata close(37) end

Bill

19 Oct 2016 8:19 #18184

Thanks Bill!, I will try your code, because I do not want to edit each file that comes from my instrument at the lab before running my program. Till now I was importing data as ASCII file in say Origin software and then export as ASCII. The funny thing is that I didn't know that the ASCII file provided by the instrument was actually a UTF-8. Origin had no problem at all, but seems that Fortran cannot deal with this type of ASCII files. Good to know that there is a way to overcome this issue. Thanks again!

Agustin

19 Oct 2016 8:53 #18185

Glad to be of some assistance.

19 Oct 2016 9:14 #18186

You could open the file and overwrite the first 3 characters. The following program appears to do this. ( I first tried access=transparent but it did not work) Needs more work to include if (iostat/=0) ... character file_name40 character ignore_me1 integer*4 iostat, i ! call get_command_argument (1, file_name) ! file_name = 'utf-8-file.dat' open (unit = 11, & file = file_name, & status = 'OLD', & form = 'UNFORMATTED', & access = 'DIRECT', & recl = 1, & iostat = iostat) ! ignore_me = ' ' do i = 1,3 write (unit=11, rec=i, iostat=iostat) ignore_me end do close (unit=11) end

19 Oct 2016 10:53 #18187

well, it seems that I have not been quite clear because I did not include what happens next....the complete action is this: I open a file and check the number of data points, then with this data, I allocate the data points and read once again the data. Bill's solution fails and John's also. The code I have for these actions is:

type coordenate
  real*4 x 
  real*4 y
end type coordenate
type(coordenate),dimension(:),allocatable :: data_point   
real*4 x1,y1
character*129 filename 
integer stat,i

open(37,file=filename,status='old',action='read')
ndata=0
do 
read(37,*,iostat=stat) x1,y1
if(stat/= 0) exit
  ndata=ndata+1
end do
close(37)
  if(allocated(data_point)) deallocate(data_point)
allocate(data_point(ndata))

open(37,file=filename,status='old',action='read')
do i=1,ndata
read(37,*) data_point(i)%x,data_point(i)%y
  end do
close(37)
19 Oct 2016 11:29 #18188

IT WORKS!!....SORRY...IT WAS MY FAULT!!!.....I did a mistake when adding the code of John and that was the reason for the failure of the subroutine (I did not notice that John changed filename to file_name).....now it works fine!!!...THANKS!!!

Time to go to bed......seems that my eyes are not seeing well at night...

But: how can I detect that an ASCII file is UTF-8 or not?, I mean with Fortran, because now the program works fine for UTF-8 files, but if I get an ANSI file, I will be erasing the three first numbers in the file!........

Agustin

19 Oct 2016 11:52 #18189

the following works without changing the file type coordenate real4 x real4 y end type coordenate type(coordenate),dimension(:),allocatable :: data_point
! real8 x1,y1 character129 filename integer i integer4 count_lines, ndata external count_lines ! filename='\tmp\utf-8-file.dat' !
ndata = count_lines (filename) write (
,) ndata,' lines identified' ! if (allocated(data_point)) deallocate(data_point) allocate(data_point(ndata)) ! do i=1,ndata call get_xy (x1,y1,i) data_point(i)%x = x1 data_point(i)%y = y1 end do write (,*) ndata,' lines recovered' !
close(37) end

 integer*4 function count_lines (filename)
  character filename*129 
  character line*80
  integer*4 i, iostat

     open (37,file=filename,status='old',action='read', iostat=iostat) 
     write (*,*) 'Opening file :',trim(filename),' iostat=',iostat
!
     do i = 1,1000000
       read (37,fmt='(a)', iostat=iostat) line
       if ( iostat /= 0 ) then
         write (*,*) 'iostat =',iostat,' at line',i
         if ( iostat < 0 ) exit
       end if
     end do
     rewind (37)

     count_lines = i-1

 end function count_lines

 subroutine get_xy (x1,y1,i) 
  real*8    x1,y1 
  integer*4 i
  integer*4 iostat
  character line*80
!
       x1 = -1
       y1 = -1
       read (37,fmt='(a)', iostat=iostat) line
       if ( iostat /= 0 ) then
         write (*,*) 'error reading file : iostat =',iostat,' at line',i
         if ( iostat < 0 ) return
       end if
!
       call clean_line (line, i)
!       
       read (line,fmt='(2f30.0)',iostat=iostat) x1,y1
       if ( iostat /= 0 ) then
         write (*,*) 'error reading from line : iostat =',iostat,' at line',i
         if ( iostat < 0 ) return
       end if
!
 end subroutine get_xy      

 subroutine clean_line (line, i)
!
!  check for numeric line, removing parity characters   
!
  character line*(*),  c
  integer*4 i,         j,k
!
    do j = 1,len_trim(line)
      c = line(j:j)
      k = ichar (c)
      if ( k > 127 ) then
        write (*,*) 'parity set in line',i,j,' ',c,k-128
        line(j:j) = ' '
        cycle
      end if
      if ( index ('0123456789.+-, ',c) > 0 ) cycle
      write (*,*) 'unrecognised character :',c,k
    end do
!
 end subroutine clean_line

I would use real*8 for the x values you are reading. you could improve on clean_line to do more cleaning

John

20 Oct 2016 1:38 #18190

I appreciate the work that John has done. Seems like a lot of work to do something simple.

BTW, I ran the program I posted on the data set you provided, and it worked, so I'd appreciate knowing what is different in what you ran and what error(s) you got. If you run this on a non-UTF-8 file, yes, it will fail. That wasn't the question posed.

To see what kind of file you have, execute the read of the first line as I have outlined. Then look at the data retrieved, and if is equal to the UTF-8 header, then read as UTF-8. Otherwise it is ASCII; perform a REWIND on the file, and begin again.

One way to do this is to read every character, bypassing the UTF-* header (if any) to get to the data, then reconstruct every line of data regardless of the header contents.

It can be done, but why? If the data are naturally constrained (UTF-8 or not), then use the constraints and go forward. Easily done, easily documented (for the next poor soul dealing with the data), and you get the job done more quickly.

Bill

20 Oct 2016 1:52 #18191

Bill,

Seems like a lot of work to do something simple.

It may be simple, but determining the size of an array to allocate and then filling the array, is a problem often discussed. So I tried to identify the key tasks and provide simple routines to do that:

count_lines : determines the number of lines in the file get_xy : retrieves the 2 real numbers from each line read_line : reads a line of text from a file ( I should have provided this) clean_line : determines if there are unexpected characters in then line and responds

What I was trying to highlight is that each of these functions could be isolated and modified if further required. Additional changes can be easily applied as the data files present more problems.

The other main approach is a reallocating a larger array, but if the file can fit in the system buffer, rereading is a fast approach, as the second read is much faster.

The file is actually an ASCII DOS file, with a 3-character header.

I have had similar problems in the past with marine survey files, where there can be 10^8 points in a file. They often present new difficulties, so reporting anything unexpected is useful.

John

20 Oct 2016 2:05 #18192

Thanks for the explanation, John. A lot of good code, and thanks for sharing it.

20 Oct 2016 1:54 #18193

Dear Bill,

the 'problem' with your code was that I forgot to mention that after reading de file, I had to re-open it to read once again the real data, so your code worked fine only for reading the file in a first instance, but as the file wasn't modified (as John did), I couldn't used it in the second stage where, once the size of the array was known, data was read for the subsequent calculations.

Agustin

20 Oct 2016 2:56 #18194

OK.

It wasn't clear that you were looking for an end-to-end solution. I'm sure the one given to you will satisfy your needs.

21 Oct 2016 6:29 #18198

I found that if the file contains data separated by tabs or spaces, instead of commas, the subroutine get_ xy fails. Is there any way to overcome this issue?.....in general I export data separeated with commas, but sometimes, my data goes through other stages that eventually translate commas into tabs or spaces.

Agustin

Please login to reply.