K
K
krakaka2020-09-28 22:43:44
Parsing
krakaka, 2020-09-28 22:43:44

I want to parse a large number of books < 17th century in search of information about the disappeared people, how to solve the problem?

in the case of pdf, I can parse with regular expressions, but "books" will of course be more often scans, and more often some kind of unformatted manuscripts, but in different languages, and moreover, outdated versions of languages. computer vision is likely to be needed, what tool would be chosen for such a task?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
D
datka, 2020-09-29
@krakaka

Take a look here https://github.com/tesseract-ocr/tesseract

D
Developer, 2020-09-28
@samodum

This task is not solved automatically, but only manually.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question