This is continue of my post when i was extracting non standard text from pdf.
Bassically i didnt succeded in doing that the way i wanted but i was lucky that all of the pdfs where the same format.
So i came to idea that i transfer pdf to image and then extract the portion of "text" as image ...bassicaly to crop the image in the rectangle where the text is.
To complete this task i have used ghostscript library
http://www.ghostscript.com/download/gsdnld.html
Also good article how to implement this in C# is on CodeProject
http://www.codeproject.com/Articles/32274/How-To-Convert-PDF-to-Image-Using-Ghostscript-API
With this two i have managed to convert PDF to image. But what format?
Well i have tested with CodeProject application from article and i have found out that the most clearest text i can get is when using TIFF format which is not such a strange thing becuse it is used for printing widely.
So the next step was to read the TIFF into C# application and crop the image.
NOTE: there is also a good library for converting pdf to image named imagemagick which also has wrapper in C#
http://imagemagick.codeplex.com/
It has also lot of other capabilities like resizing,cropping etc.
But i didnt used it because after setting up the project for testing with this library and trying how it converts pdf to image i wasnt very satisfied because of the time it needs to convert so i decided to go with ghostscript.
see http://apitron.com/Product/pdf-rasterizer
OdgovoriIzbrišiYou can convert pdf to image and vice versa by using the following .NET Library.
OdgovoriIzbrišihttp://www.aspose.com/.net/pdf-component.aspx
Thanks for the information. Alternatively ,I got another method from google, share with you guys.
OdgovoriIzbrišihttp://www.e-iceblue.com/Tutorials/Spire.PDF/Spire.PDF-Program-Guide/Convert-PDF-Page-to-Image-with-C-code.html
C# converting pdf to tiff image
OdgovoriIzbriši