M
M
musclecode2018-01-08 20:02:01
Text recognising
musclecode, 2018-01-08 20:02:01

How to extract text from a pdf file without getting hurt?

you need to extract words and their meanings separately from the dictionary, there are more than 10k words in the dictionary, it will be difficult and tedious to do it manually, are there any options for how to do this faster, more efficiently?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
A
Antonio Solo, 2018-01-08
@musclecode

it all depends on the internal structure of the pdf.
I once overtook pdf into pictures and then through a text recognizer. but I suspect this will not work with a dictionary.
if the internal structure of a pdf is regular, then you can write a decoder - after all, this is a text format and you can write a decoder here is an example https://habrahabr.ru/post/69568/

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question