Silverfrost Forums

Welcome to our forums

Cyrillic characters

22 Jan 2019 8:15 #23158

Does anyone have a simple example of a Fortran program that reads and writes files containing Cyrillic characters? In addition, is it possible to include Cyrillic characters in a character string within a Fortran program? As soon as I try the latter, Plato indicates it wants to save the file in Unicode and then will not compile it.

23 Jan 2019 1:01 #23162

Fortran compilers lag behind C compilers in regard to their support for multi-byte and Unicode characters. Without such support (i.e., character types with more than just one KIND), what you can do in Fortran is quite limited. Using a library such as ICU, see http://site.icu-project.org/home , you can call C routines from your Fortran code.

The following code works with FTN95 and illustrates how to count the number of 'words' in Cyrillic text and to separate the input text and print one word per line into an output file.

! count words in UTF-8 file as count of space characters + 1
program readcyr
implicit none
character(500) lin
integer i,ls,nw
!
open(10,file='cyr.ut8',status='old')
read(10,'(A)')lin
close(10)
nw=1
ls=len_trim(lin)
do i=1,ls
   if(lin(i:i) == ' ')then
      nw=nw+1
      lin(i:i) = char(10)
   endif
end do
open(10,file='cyrw.ut8',status='replace')
write(10,'(A)')lin
close(10)
print *,nw ,' words'
end program

You may use the following as the contents of cyr.ut8:

Мы собрали все системы, чтобы определить лучших. Поэтому перед вами, наверное, самый полный список конструкторов сайтов в рунете. Если вы знаете конструктор сайтов, которого нет в этому списке - можете добавить его через форму.

You can see from the output file that the program counts '-' as a word, and it probably has other such bugs that need to be fixed, but you get the idea, I hope.

23 Jan 2019 7:19 #23164

Literal Cyrillic character strings can be inserted into an FTN95 program using Plato provided that the file is saved using the Advanced Save options and then the encoding 'UTF8 (without signature)'. A file that already has this encoding will automatically be saved in this way,

ClearWin+ can also use UTF8 characters if a prior call is made to ENABLE_UTF8@. Details can be found in the enhancements file 'clrwin.enh' and item 334.

25 Jan 2019 2:15 #23185

Thank you.

FTN95 apparently does not support KIND parameters for CHARACTER variables. In the simple programme below, FTN95 complains about the string length in the parameter declaration:

Module m
  Character(Len=*), Dimension(2), Parameter :: cjan = &
     (/'Jan', 'Янв'/)
End Module m

Would this problem be potentially resolvable if a different kind parameter could be used?[/code]

25 Jan 2019 8:15 #23187

Simon

The quick fix is to add three spaces to Jan....

winapp
Module m 
  Character(Len=*), Dimension(2), Parameter :: cjan = & 
     (/'Jan   ', 'Янв'/) 
End Module m

program main
use m
use clrwin
call ENABLE_UTF8@(1)
i = winio@('%ws', cjan(2))
end
25 Jan 2019 4:34 #23190

Paul's work-around succeeds if the Fortran source file is saved as UTF-8 without BOM (byte order mark).

I tried with UTF-8 + BOM and with UTF-16 LE, and the compiler rejected these versions.

Similarly, I expect that the lexical intrinsics LGT, LGE, etc., will not work correctly with Cyrillic characters, so more elaborate coding will be needed to do such comparisons, sorting of words, etc. Therefore, before writing a lot of code, it would be necessary to make a full assessment of what kinds of processing will be performed.

Please login to reply.