Thursday, August 25, 2011

Working with PDF files


In our project, we needed to read check the contents of a 'PDF report'  that comes embedded in a 'IE' window.The process is a little complicated and not so straightforward.


First you will need to download pdftk. Download these files and extract the files only in the C:\windows\system32 folder. http://www.accesspdf.com/article.php/20041130153545577 


Secondly you will need to download and isntall xpdf : http://pdf-toolkit.rubyforge.org/ . Extract those files into the C:\windows\system32 folder.Then you will need the PDF::TOOLKIT gem. This can be found here http://rubyforge.org/projects/pdf-toolkit/ 


Basically this will convert the pdf to a textfile and you can do what 
you like with it. In the following example I have just read a file on
my c:\ and displayed it using the 'puts' command.



require 'rubygems'
require 'pdf/toolkit' 

my_pdf = PDF::Toolkit.open("c:\\file.pdf")
text = my_pdf.to_text.read
puts text



I hope this helps

No comments:

Post a Comment