This is continue of my post when i was extracting non standard text from pdf.
Bassically i didnt succeded in doing that the way i wanted but i was lucky that all of the pdfs where the same format.
So i came to idea that i transfer pdf to image and then extract the portion of "text" as image ...bassicaly to crop the image in the rectangle where the text is.
To complete this task i have used ghostscript library
Also good article how to implement this in C# is on CodeProject
With this two i have managed to convert PDF to image. But what format?
Well i have tested with CodeProject application from article and i have found out that the most clearest text i can get is when using TIFF format which is not such a strange thing becuse it is used for printing widely.
So the next step was to read the TIFF into C# application and crop the image.
NOTE: there is also a good library for converting pdf to image named imagemagick which also has wrapper in C#
It has also lot of other capabilities like resizing,cropping etc.
But i didnt used it because after setting up the project for testing with this library and trying how it converts pdf to image i wasnt very satisfied because of the time it needs to convert so i decided to go with ghostscript.