How to supplement the recognition alphabet with the necessary characters from another language in pytesseract?

I

Ivan Melnikov2018-06-28 13:59:03

Python

Ivan Melnikov, 2018-06-28 13:59:03

Installed the Tesseract-OCR engine, then installed the pytesseract wrapper package on Python 3.6.
I recognize Russian text:

text = pytesseract.image_to_string(Image.open(filename), lang='rus')

Russian text is recognized without problems. However, in addition to Russian letters, the text contains two more letters from the English alphabet: N and E. How can I tell the script that in addition to Russian letters, the text can contain two more given letters from the English alphabet? Or maybe you can set your own character set.
And another question. How to specify the font for the engine?

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

D

Dimonchik, 2018-06-29
@dimonchik2013

for a simple
lang="rus+eng"
just don't go nuts with the results))
for a complex one - sculpt your own training