Answer the question
In order to leave comments, you need to log in
How to get data from the image of a document (passport, driver's license..)?
Hello, the actual question is in the title, there are images of documents, you need to extract data from there (name, date of birth, and so on).
I found two options for myself:
1. Just image => text conversion and regular expressions to pull out the necessary data, but unfortunately https://github.com/tesseract-ocr/tesseract does not cope well with Russian characters, which library can be used that will do a good job with Russian text?
2. A more complex approach: feature extraction (pull out the desired part of the image where the document is located) and teach the neural network to find the right areas and pull out the text from there. I have not come across this before, so there is only superficial knowledge.
I will be glad to any suggestions - libraries, articles, projects.
ps I'm only looking at the free version
Answer the question
In order to leave comments, you need to log in
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question