forums.silverfrost.com Forum Index forums.silverfrost.com
Welcome to the Silverfrost forums
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Is it possible to read a PDF file from Fortran ?

 
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> Support
View previous topic :: View next topic  
Author Message
WDG



Joined: 13 May 2008
Posts: 14

PostPosted: Wed May 14, 2008 11:20 am    Post subject: Is it possible to read a PDF file from Fortran ? Reply with quote

I would like to retrieve data from a PDF file is it possible?
Back to top
View user's profile Send private message
JohnHorspool



Joined: 26 Sep 2005
Posts: 270
Location: Gloucestershire UK

PostPosted: Wed May 14, 2008 12:26 pm    Post subject: Reply with quote

In theory yes, but in practice almost certainly no.

If you look at a pdf file in an editor, you see head and tail sections in ascii text with binary in between. Unless you can obtain information on how this binary section is written then it would be a near impossible task to read it correctly.
Back to top
View user's profile Send private message Visit poster's website
WDG



Joined: 13 May 2008
Posts: 14

PostPosted: Thu May 15, 2008 2:56 am    Post subject: THANK YOU Reply with quote

Crying or Very sad
Back to top
View user's profile Send private message
LitusSaxonicum



Joined: 23 Aug 2005
Posts: 2388
Location: Yateley, Hants, UK

PostPosted: Thu May 15, 2008 10:06 am    Post subject: Reply with quote

Assuming your PDF file started out as text, there are various ways to convert it back to text. For example, you can buy software that does it. Apparently, the full "professional" version of Adobe Acrobat allows you to save back to (for example) Word or Notepad format files.

Once back as a text file, you can read it in Fortran.

If the original source contained pictures, or was simple a picture, then you are out of luck.

Google for "convert PDF to text", and you will get hundreds of links.

Since the PDF encoding is reversible, someone out there must know the algorithm (I don't, but the internet is swarming with people who do!). If you know the algorithm, then you can program it in Fortran.

My Google search even found a "PDF converter for .NET", which presumably you could link into a .NET FTN95 program, which would save you programming it yourself. (www.winnovative-software.com)

Regards

Eddie
Back to top
View user's profile Send private message
Andrew



Joined: 09 Sep 2004
Posts: 232
Location: Frankfurt, Germany

PostPosted: Thu May 15, 2008 10:20 am    Post subject: Reply with quote

You could always write a PDF parser Smile

You would need to make use of binary file reads (see READF@ etc.) but as has been previously stated, you need to know the binary file format (which is around) and is definately a non-trivial task.
Back to top
View user's profile Send private message
WDG



Joined: 13 May 2008
Posts: 14

PostPosted: Sat May 17, 2008 1:12 am    Post subject: Andrew I am trying the pdf parser route and having Reply with quote

success one small problem
any idea how I can figure what this character is
unfortunately I cannot get the character in the post Embarassed

to describe it I would say it is an unfilled square
[img]
[/img]
Back to top
View user's profile Send private message
IanLambley



Joined: 17 Dec 2006
Posts: 490
Location: Sunderland

PostPosted: Sun May 18, 2008 10:56 am    Post subject: Reply with quote

Hi.

The unfilled square character can be various characters and usually means it is an ASCII control character with a value in the range 0 to 31 decimal, 0 to 1F hex, for which no graphic is defined.

If you want to see these then get hold of an editor such as UltraEdit http://www.ultraedit.com/ which has a HEX display mode, or write yourself a Hex-dump program using the OPENF@ and READF@ type of routines available with FTN95 which will allow you to read the CR LF charactres, which editors such as Notepad will interprete but not explicitly show you. You can also open any file as direct access with a record length of 1 and pick out the individual characters in the same way the FTN95 routines can.

With regard to the format of a PDF file, or lots of other files, look at http://www.wotsit.org.

For a very limited and more maual method, you can use the text copy tool in Adobe Reader and paste into an editor or Excel, and save that in a suitable text file format, e.g. CSV or tab delimited etc. There is also a "Save as Text" command in Adobe Acrobat reader, have you tried that?

The code for a dump program hexdump.f95 is:
Code:

character*16 line_in
character*80 line_out
integer*1    characters(16)
integer*2 handle, error_code
character*260 file_in
nbytes_at_a_time = 16
ifile_position=0
call command_line(file_in)
if(file_in .ne. ' ')then
  call openr@(file_in, handle, error_code)
  if(error_code .ne. 0)then
    call doserr@(error_code)
  else
    nbytes_read = 1
    do while(nbytes_read .gt. 0)
      line_in = ' '
      call readf@(line_in, handle, nbytes_at_a_time, nbytes_read, error_code)
      do i=1,nbytes_read
! get the character code for each input character
        characters(i) = ichar(line_in(i:i))
! translate unprintable characters to a dot for output presentation
        if(characters(i) .lt. 32)line_in(i:i)='.'
      enddo
! print out the data
      line_out = ' '
      write(line_out,1000)ifile_position,(characters(i),i=1,nbytes_read)
 1000 format(z8.8,'h: ',16(z2.2,' '))
      write(line_out(60:),1010)line_in
 1010 format('; ',a)
      if(nbytes_read .gt. 0)print *,trim(line_out)
      ifile_position =ifile_position + nbytes_at_a_time
    enddo
    call closef@(handle,error_code)
  endif
endif
end

Compile:
ftn95 hexdump.95/nowindows/link

Typical command line usage is, for screen output:
hexdump input_file.nam
or
output to a file
hexdump input_file.nam >hexdump.out

I hope this helps

Regards

Ian
Back to top
View user's profile Send private message Send e-mail
WDG



Joined: 13 May 2008
Posts: 14

PostPosted: Sun May 18, 2008 11:41 am    Post subject: Thank you IanLambley Reply with quote

Thank you kind sir.
I think I have it now. Very Happy Very Happy
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> Support All times are GMT + 1 Hour
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group