petak, 3. veljače 2012.

C# converting pdf to image

This is continue of my post when i was extracting non standard text from pdf.

Bassically i didnt succeded in doing that the way i wanted but i was lucky that all of the pdfs where the same format.

So i came to idea that i transfer pdf to image and then extract the portion of  "text" as image ...bassicaly to crop the image in the rectangle where the text is.

To complete this task i have used ghostscript library
http://www.ghostscript.com/download/gsdnld.html


Also good article how to implement this in C# is on CodeProject
http://www.codeproject.com/Articles/32274/How-To-Convert-PDF-to-Image-Using-Ghostscript-API

With this two i have managed to convert PDF to image. But what format?

Well i have tested with CodeProject application from article and i have found out that the most clearest text i can get is when using TIFF format which is not such a strange thing becuse it is used for printing widely.

So the next step was to read the TIFF into C# application and crop the image.

NOTE: there is also a good library for converting pdf to image named imagemagick which also has wrapper in C#
http://imagemagick.codeplex.com/


It has also lot of other capabilities like resizing,cropping etc.
But i didnt used it because after setting up the project for testing with this library and trying how it converts pdf to image i wasnt very satisfied because of the time it needs to convert so i decided to go with ghostscript.

4 komentara:

  1. see http://apitron.com/Product/pdf-rasterizer

    OdgovoriIzbriši
  2. You can convert pdf to image and vice versa by using the following .NET Library.

    http://www.aspose.com/.net/pdf-component.aspx

    OdgovoriIzbriši
  3. Thanks for the information. Alternatively ,I got another method from google, share with you guys.

    http://www.e-iceblue.com/Tutorials/Spire.PDF/Spire.PDF-Program-Guide/Convert-PDF-Page-to-Image-with-C-code.html

    OdgovoriIzbriši