I
I
Ivan Melnikov2018-06-28 13:59:03
Python
Ivan Melnikov, 2018-06-28 13:59:03

How to supplement the recognition alphabet with the necessary characters from another language in pytesseract?

Installed the Tesseract-OCR engine, then installed the pytesseract wrapper package on Python 3.6.
I recognize Russian text:

text = pytesseract.image_to_string(Image.open(filename), lang='rus')

Russian text is recognized without problems. However, in addition to Russian letters, the text contains two more letters from the English alphabet: N and E. How can I tell the script that in addition to Russian letters, the text can contain two more given letters from the English alphabet? Or maybe you can set your own character set.
And another question. How to specify the font for the engine?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
D
Dimonchik, 2018-06-29
@dimonchik2013

for a simple
lang="rus+eng"
just don't go nuts with the results))
for a complex one - sculpt your own training

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question