V
V
vyn2017-04-14 13:42:41
Java
vyn, 2017-04-14 13:42:41

Print form recognition. Optimal algorithm?

Good afternoon! The task is to recognize the scanned printed form and convert it to html form. I'm currently using the tess4j java fork of the tesseract library. However, there was a problem of image segmentation into subregions (div regions) with the purpose of poppy. recognition quality. Are there other solutions, och. preferably freeware?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
M
Max, 2017-07-26
Akhmadinurov @movemind

In our experience OCR from tesseract is terrible :) it really is.
Try at least the Google Cloud Vision API - up to 1000 pages for free, and then for every 1000 only $ 1.5
But the best of all, of course, is ABBYY, it has the best OCR.
You need to search on the topic form processing - this is exactly finding areas in the document, and not just recognizing it.
Try searching like this:
- ocr form processing open source
- form processing java

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question