A
A
Andrew2012-10-02 09:10:15
Image processing
Andrew, 2012-10-02 09:10:15

How to denoise an image with text to use it in Tesseract OCR?

The crux of the matter is this. There is a binary picture (state number of the car). The image is passed on to the tesseract API for OCR. If there is no noise, then everything works fine - the text is recognized. However, in reality there are always spots in the image. Gaussian blur and morphology operations already applied, small details filtered out. But there are a few spots left that lead to an incorrect result when recognizing characters. Maybe there is an opportunity to somehow configure tesseract? I use all default settings. In theory, it is necessary to select connected areas on the autonomer and filter out those that have a deliberately small area. But how to do it quickly with opencv, I don't know. Thank you.

Add

If someone is interested, then this problem can be solved like this.
Apply the Canny operator to the license plate image, while looking only for outer contours
. Then filter the resulting contours by size, cut them out and recognize each character separately.
There is almost no drop in performance.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
M
mfox, 2012-11-13
@mfox

I'm trying to do something similar, I need to recognize the numbers on the receipts. There are problems with recognition - small details at the edges interfere. But since I am far from image processing, I don’t understand the phrase “filter the resulting contours by size, cut out”. Is this done with opencv? Can you tell me where I can see how it's done?

A
Andrew, 2012-11-13
@xaoc80

"Is this done with opencv? Can you tell me where I can see how it's done? »
Yes, this can be done using opencv
I dashed off an article on IBM DW with code examples if you are interested - knock the link in a personal

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question