Topic: Is it possible to read a PDF file from Fortran ? in Support

WDG

Posts: 14

Back to Top

14 May 2008 10:20 #3189

I would like to retrieve data from a PDF file is it possible?

JohnHorspool

Posts: 260 Gloucestershire UK

Back to Top

14 May 2008 11:26 #3190

In theory yes, but in practice almost certainly no.

If you look at a pdf file in an editor, you see head and tail sections in ascii text with binary in between. Unless you can obtain information on how this binary section is written then it would be a near impossible task to read it correctly.

WDG

Posts: 14

Back to Top

15 May 2008 1:56 #3196

😢

LitusSaxonicum

Posts: 2284 Yateley, Hants, UK

Back to Top

15 May 2008 9:06 #3200

Assuming your PDF file started out as text, there are various ways to convert it back to text. For example, you can buy software that does it. Apparently, the full 'professional' version of Adobe Acrobat allows you to save back to (for example) Word or Notepad format files.

Once back as a text file, you can read it in Fortran.

If the original source contained pictures, or was simple a picture, then you are out of luck.

Google for 'convert PDF to text', and you will get hundreds of links.

Since the PDF encoding is reversible, someone out there must know the algorithm (I don't, but the internet is swarming with people who do!). If you know the algorithm, then you can program it in Fortran.

My Google search even found a 'PDF converter for .NET', which presumably you could link into a .NET FTN95 program, which would save you programming it yourself. (www.winnovative-software.com)

Regards

Eddie

Andrew

Posts: 186 Frankfurt, Germany

Back to Top

15 May 2008 9:20 #3202

You could always write a PDF parser 😃

You would need to make use of binary file reads (see READF@ etc.) but as has been previously stated, you need to know the binary file format (which is around) and is definately a non-trivial task.

WDG

Posts: 14

Back to Top

17 May 2008 12:12 #3210

success one small problem any idea how I can figure what this character is unfortunately I cannot get the character in the post 😮ops:

to describe it I would say it is an unfilled square [img] [/img]

IanLambley

Posts: 501 Sunderland

Back to Top

18 May 2008 9:56 #3211

Hi.

The unfilled square character can be various characters and usually means it is an ASCII control character with a value in the range 0 to 31 decimal, 0 to 1F hex, for which no graphic is defined.

If you want to see these then get hold of an editor such as UltraEdit http://www.ultraedit.com/ which has a HEX display mode, or write yourself a Hex-dump program using the OPENF@ and READF@ type of routines available with FTN95 which will allow you to read the CR LF charactres, which editors such as Notepad will interprete but not explicitly show you. You can also open any file as direct access with a record length of 1 and pick out the individual characters in the same way the FTN95 routines can.

With regard to the format of a PDF file, or lots of other files, look at http://www.wotsit.org.

For a very limited and more maual method, you can use the text copy tool in Adobe Reader and paste into an editor or Excel, and save that in a suitable text file format, e.g. CSV or tab delimited etc. There is also a 'Save as Text' command in Adobe Acrobat reader, have you tried that?

The code for a dump program hexdump.f95 is:

character*16 line_in
character*80 line_out
integer*1    characters(16)
integer*2 handle, error_code
character*260 file_in
nbytes_at_a_time = 16
ifile_position=0
call command_line(file_in)
if(file_in .ne. ' ')then
  call openr@(file_in, handle, error_code) 
  if(error_code .ne. 0)then
    call doserr@(error_code)
  else
    nbytes_read = 1
    do while(nbytes_read .gt. 0)
      line_in = ' '
      call readf@(line_in, handle, nbytes_at_a_time, nbytes_read, error_code) 
      do i=1,nbytes_read
! get the character code for each input character
        characters(i) = ichar(line_in(i:i))
! translate unprintable characters to a dot for output presentation
        if(characters(i) .lt. 32)line_in(i:i)='.'
      enddo
! print out the data
      line_out = ' '
      write(line_out,1000)ifile_position,(characters(i),i=1,nbytes_read)
 1000 format(z8.8,'h: ',16(z2.2,' '))
      write(line_out(60:),1010)line_in
 1010 format('; ',a)
      if(nbytes_read .gt. 0)print *,trim(line_out)
      ifile_position =ifile_position + nbytes_at_a_time
    enddo
    call closef@(handle,error_code)
  endif
endif
end

Compile: ftn95 hexdump.95/nowindows/link

Typical command line usage is, for screen output: hexdump input_file.nam or output to a file hexdump input_file.nam >hexdump.out

I hope this helps

Regards

Ian

WDG

Posts: 14

Back to Top

18 May 2008 10:41 #3212

Thank you kind sir. I think I have it now. 😄 😄