I would like to retrieve data from a PDF file is it possible?
Is it possible to read a PDF file from Fortran ?
In theory yes, but in practice almost certainly no.
If you look at a pdf file in an editor, you see head and tail sections in ascii text with binary in between. Unless you can obtain information on how this binary section is written then it would be a near impossible task to read it correctly.
😢
Assuming your PDF file started out as text, there are various ways to convert it back to text. For example, you can buy software that does it. Apparently, the full 'professional' version of Adobe Acrobat allows you to save back to (for example) Word or Notepad format files.
Once back as a text file, you can read it in Fortran.
If the original source contained pictures, or was simple a picture, then you are out of luck.
Google for 'convert PDF to text', and you will get hundreds of links.
Since the PDF encoding is reversible, someone out there must know the algorithm (I don't, but the internet is swarming with people who do!). If you know the algorithm, then you can program it in Fortran.
My Google search even found a 'PDF converter for .NET', which presumably you could link into a .NET FTN95 program, which would save you programming it yourself. (www.winnovative-software.com)
Regards
Eddie
You could always write a PDF parser 😃
You would need to make use of binary file reads (see READF@ etc.) but as has been previously stated, you need to know the binary file format (which is around) and is definately a non-trivial task.
success one small problem any idea how I can figure what this character is unfortunately I cannot get the character in the post 😮ops:
to describe it I would say it is an unfilled square [img] [/img]
Hi.
The unfilled square character can be various characters and usually means it is an ASCII control character with a value in the range 0 to 31 decimal, 0 to 1F hex, for which no graphic is defined.
If you want to see these then get hold of an editor such as UltraEdit http://www.ultraedit.com/ which has a HEX display mode, or write yourself a Hex-dump program using the OPENF@ and READF@ type of routines available with FTN95 which will allow you to read the CR LF charactres, which editors such as Notepad will interprete but not explicitly show you. You can also open any file as direct access with a record length of 1 and pick out the individual characters in the same way the FTN95 routines can.
With regard to the format of a PDF file, or lots of other files, look at http://www.wotsit.org.
For a very limited and more maual method, you can use the text copy tool in Adobe Reader and paste into an editor or Excel, and save that in a suitable text file format, e.g. CSV or tab delimited etc. There is also a 'Save as Text' command in Adobe Acrobat reader, have you tried that?
The code for a dump program hexdump.f95 is:
character*16 line_in
character*80 line_out
integer*1 characters(16)
integer*2 handle, error_code
character*260 file_in
nbytes_at_a_time = 16
ifile_position=0
call command_line(file_in)
if(file_in .ne. ' ')then
call openr@(file_in, handle, error_code)
if(error_code .ne. 0)then
call doserr@(error_code)
else
nbytes_read = 1
do while(nbytes_read .gt. 0)
line_in = ' '
call readf@(line_in, handle, nbytes_at_a_time, nbytes_read, error_code)
do i=1,nbytes_read
! get the character code for each input character
characters(i) = ichar(line_in(i:i))
! translate unprintable characters to a dot for output presentation
if(characters(i) .lt. 32)line_in(i:i)='.'
enddo
! print out the data
line_out = ' '
write(line_out,1000)ifile_position,(characters(i),i=1,nbytes_read)
1000 format(z8.8,'h: ',16(z2.2,' '))
write(line_out(60:),1010)line_in
1010 format('; ',a)
if(nbytes_read .gt. 0)print *,trim(line_out)
ifile_position =ifile_position + nbytes_at_a_time
enddo
call closef@(handle,error_code)
endif
endif
end
Compile: ftn95 hexdump.95/nowindows/link
Typical command line usage is, for screen output: hexdump input_file.nam or output to a file hexdump input_file.nam >hexdump.out
I hope this helps
Regards
Ian
Thank you kind sir. I think I have it now. 😄 😄