|
forums.silverfrost.com Welcome to the Silverfrost forums
|
View previous topic :: View next topic |
Author |
Message |
WDG
Joined: 13 May 2008 Posts: 14
|
Posted: Wed May 14, 2008 11:20 am Post subject: Is it possible to read a PDF file from Fortran ? |
|
|
I would like to retrieve data from a PDF file is it possible? |
|
Back to top |
|
|
JohnHorspool
Joined: 26 Sep 2005 Posts: 270 Location: Gloucestershire UK
|
Posted: Wed May 14, 2008 12:26 pm Post subject: |
|
|
In theory yes, but in practice almost certainly no.
If you look at a pdf file in an editor, you see head and tail sections in ascii text with binary in between. Unless you can obtain information on how this binary section is written then it would be a near impossible task to read it correctly. |
|
Back to top |
|
|
WDG
Joined: 13 May 2008 Posts: 14
|
Posted: Thu May 15, 2008 2:56 am Post subject: THANK YOU |
|
|
|
|
Back to top |
|
|
LitusSaxonicum
Joined: 23 Aug 2005 Posts: 2391 Location: Yateley, Hants, UK
|
Posted: Thu May 15, 2008 10:06 am Post subject: |
|
|
Assuming your PDF file started out as text, there are various ways to convert it back to text. For example, you can buy software that does it. Apparently, the full "professional" version of Adobe Acrobat allows you to save back to (for example) Word or Notepad format files.
Once back as a text file, you can read it in Fortran.
If the original source contained pictures, or was simple a picture, then you are out of luck.
Google for "convert PDF to text", and you will get hundreds of links.
Since the PDF encoding is reversible, someone out there must know the algorithm (I don't, but the internet is swarming with people who do!). If you know the algorithm, then you can program it in Fortran.
My Google search even found a "PDF converter for .NET", which presumably you could link into a .NET FTN95 program, which would save you programming it yourself. (www.winnovative-software.com)
Regards
Eddie |
|
Back to top |
|
|
Andrew
Joined: 09 Sep 2004 Posts: 232 Location: Frankfurt, Germany
|
Posted: Thu May 15, 2008 10:20 am Post subject: |
|
|
You could always write a PDF parser
You would need to make use of binary file reads (see READF@ etc.) but as has been previously stated, you need to know the binary file format (which is around) and is definately a non-trivial task. |
|
Back to top |
|
|
WDG
Joined: 13 May 2008 Posts: 14
|
Posted: Sat May 17, 2008 1:12 am Post subject: Andrew I am trying the pdf parser route and having |
|
|
success one small problem
any idea how I can figure what this character is
unfortunately I cannot get the character in the post
to describe it I would say it is an unfilled square
[img]
[/img] |
|
Back to top |
|
|
IanLambley
Joined: 17 Dec 2006 Posts: 490 Location: Sunderland
|
Posted: Sun May 18, 2008 10:56 am Post subject: |
|
|
Hi.
The unfilled square character can be various characters and usually means it is an ASCII control character with a value in the range 0 to 31 decimal, 0 to 1F hex, for which no graphic is defined.
If you want to see these then get hold of an editor such as UltraEdit http://www.ultraedit.com/ which has a HEX display mode, or write yourself a Hex-dump program using the OPENF@ and READF@ type of routines available with FTN95 which will allow you to read the CR LF charactres, which editors such as Notepad will interprete but not explicitly show you. You can also open any file as direct access with a record length of 1 and pick out the individual characters in the same way the FTN95 routines can.
With regard to the format of a PDF file, or lots of other files, look at http://www.wotsit.org.
For a very limited and more maual method, you can use the text copy tool in Adobe Reader and paste into an editor or Excel, and save that in a suitable text file format, e.g. CSV or tab delimited etc. There is also a "Save as Text" command in Adobe Acrobat reader, have you tried that?
The code for a dump program hexdump.f95 is:
Code: |
character*16 line_in
character*80 line_out
integer*1 characters(16)
integer*2 handle, error_code
character*260 file_in
nbytes_at_a_time = 16
ifile_position=0
call command_line(file_in)
if(file_in .ne. ' ')then
call openr@(file_in, handle, error_code)
if(error_code .ne. 0)then
call doserr@(error_code)
else
nbytes_read = 1
do while(nbytes_read .gt. 0)
line_in = ' '
call readf@(line_in, handle, nbytes_at_a_time, nbytes_read, error_code)
do i=1,nbytes_read
! get the character code for each input character
characters(i) = ichar(line_in(i:i))
! translate unprintable characters to a dot for output presentation
if(characters(i) .lt. 32)line_in(i:i)='.'
enddo
! print out the data
line_out = ' '
write(line_out,1000)ifile_position,(characters(i),i=1,nbytes_read)
1000 format(z8.8,'h: ',16(z2.2,' '))
write(line_out(60:),1010)line_in
1010 format('; ',a)
if(nbytes_read .gt. 0)print *,trim(line_out)
ifile_position =ifile_position + nbytes_at_a_time
enddo
call closef@(handle,error_code)
endif
endif
end
|
Compile:
ftn95 hexdump.95/nowindows/link
Typical command line usage is, for screen output:
hexdump input_file.nam
or
output to a file
hexdump input_file.nam >hexdump.out
I hope this helps
Regards
Ian |
|
Back to top |
|
|
WDG
Joined: 13 May 2008 Posts: 14
|
Posted: Sun May 18, 2008 11:41 am Post subject: Thank you IanLambley |
|
|
Thank you kind sir.
I think I have it now. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|