M
M
mostalk2020-07-25 13:35:09
Python
mostalk, 2020-07-25 13:35:09

How to implement text reading over lines in OpenCV?

Hello, I have a document with lines over which there is text. I think you need to find a line according to the height of the text, make a rectangle and send it to the tesseract, but I don’t know how to do it correctly.
Or maybe there are other simpler implementations?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
N
Nuchimik, 2020-07-25
@mostalk

By document, do you mean Word, PDF, another format, or a regular image? Or do these documents contain images with text?
If your task is to recognize text in certain places (i.e. you do not need raw text recognized by the tesseract, but the location of this text is important), you can follow the following algorithm:

  1. Extract an image with lines and text from your document (this step is optional because the format of the input data from your question is not known).
  2. Apply median filtering. This type of filtering is well suited for your task, and is also quite easy to understand. You can read about it here and here (a review of simple filters and a little mathematical part on filters). This filter copes well with small noise and does not blur the edges, which is very important for your task. opencv example
  3. Next you need to find the lines. I mean they are horizontal. But even if not, it's not scary. You can use the Hough transforms . But before that, you should use some kind of edge detector. The most common is the Canny edge detector ( Article on Habré ). opencv example
  4. From the lines found, you can determine if the image needs to be rotated if the lines are not strictly horizontal. This step is necessary to get more accurate results from the tessertact.
  5. After that, just sort the lines in the order you need and extract the text by coordinates.

PS This algorithm is applicable for the task you described in the question.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question