How to get data from the image of a document (passport, driver's license..)?

I

IceJOKER2019-03-01 12:42:13

Python

IceJOKER, 2019-03-01 12:42:13

Hello, the actual question is in the title, there are images of documents, you need to extract data from there (name, date of birth, and so on).
I found two options for myself:
1. Just image => text conversion and regular expressions to pull out the necessary data, but unfortunately https://github.com/tesseract-ocr/tesseract does not cope well with Russian characters, which library can be used that will do a good job with Russian text?
2. A more complex approach: feature extraction (pull out the desired part of the image where the document is located) and teach the neural network to find the right areas and pull out the text from there. I have not come across this before, so there is only superficial knowledge.
I will be glad to any suggestions - libraries, articles, projects.
ps I'm only looking at the free version

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

L

longclaps, 2019-03-01
@longclaps

PassportReader

V

Vladislav Lyskov, 2019-03-01
@Vlatqa

https://pypi.org/project/PassportEye/