C
C
ChemAli2012-01-13 13:59:16
Parsing
ChemAli, 2012-01-13 13:59:16

PDF Parsing with Block Position Extraction

Is it possible to parse a pdf file (text and images) in such a way as to extract individual blocks of text from it and determine the coordinates of the location of these blocks?

The ultimate task: searching for text in a file highlighting what was found.

The implementations I've found stop at extracting solid text.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
E
egorinsk, 2012-01-13
@egorinsk

Certainly. really. These coordinates are stored in a PDF file, and there is no problem extracting them from there. Details in the PDF specification.

S
Sergey, 2012-01-13
Protko @Fesor

The inflamed brain gave rise to the idea of ​​​​translating PDF into images, finding block coordinates, parsing text, selecting what is needed in the desired block and then taking the block coordinates ... O_o. That's right, there was one project where you had to look for empty spaces in a PDF document and fill them with advertising garbage. In the search context, there are many options. The problem needs to be more clearly defined. What they say is at the entrance and what should be at the exit.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question