I
I
IceJOKER2019-03-01 12:42:13
Python
IceJOKER, 2019-03-01 12:42:13

How to get data from the image of a document (passport, driver's license..)?

Hello, the actual question is in the title, there are images of documents, you need to extract data from there (name, date of birth, and so on).
I found two options for myself:
1. Just image => text conversion and regular expressions to pull out the necessary data, but unfortunately https://github.com/tesseract-ocr/tesseract does not cope well with Russian characters, which library can be used that will do a good job with Russian text?
2. A more complex approach: feature extraction (pull out the desired part of the image where the document is located) and teach the neural network to find the right areas and pull out the text from there. I have not come across this before, so there is only superficial knowledge.
I will be glad to any suggestions - libraries, articles, projects.
ps I'm only looking at the free version

Answer the question

In order to leave comments, you need to log in

2 answer(s)
L
longclaps, 2019-03-01
@longclaps

PassportReader

V
Vladislav Lyskov, 2019-03-01
@Vlatqa

https://pypi.org/project/PassportEye/

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question