S
S
Safronov_Alexei2020-12-21 18:29:11
PDF
Safronov_Alexei, 2020-12-21 18:29:11

How to take certain data from a PDF document?

Hi Habr!
I ran into a problem, I wrote a function that takes the data I need from a PDF file and transfers it to the database.
Did it through converting PDF -> Word | RTF and after that, WORD | RTF -> TXT => And there I found the words that are next to the data and took the data, well, that is, let's say there was a line "Facility Coca-Cola" in the PDF, I searched for Facility and took a worthy word next to it. But at one moment I realized that the words standing next to me jump higher and lower and cannot be tracked with accuracy. What are the possible solutions?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
A
Alexey Cheremisin, 2020-12-21
@leahch

I dare to disappoint you, but there are simply no solutions! It all depends on how a particular pdf is made and laid out. Sometimes it happens that parts of one visible paragraph are physically in the file itself in completely different places and blocks.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question