D
D
Dazz1e2015-04-07 16:38:49
PHP
Dazz1e, 2015-04-07 16:38:49

Recognition of a scanned copy of a document on a website! Where to begin?

Hello,
The task is to create an automated filter for documents entering the online archive.
Known forms of documents are available.
How to read the document when uploading an image (scanned copy of a document) and if nothing is entered in the required fields (not even a scribble), then the document is not skipped, if there is something there, the document goes into the archive and a text file is created (log ) which fields (including required and optional) contain information?
You do not need to read the text, at least make sure that there is something there.
Blank:
61e544079c644c2091f16667481be528.jpg
Sample:
d35b2837997d4c6592e6bf2cc2d0382e.jpg
Please explain where to start to achieve the goal and in what direction to move!
Thank you in advance!

Answer the question

In order to leave comments, you need to log in

4 answer(s)
S
Sergey, 2015-04-07
@begemot_sun

With integration with ABBY services

X
xmoonlight, 2015-04-07
@xmoonlight

1. Combining a blank form with a filled one (cleaning, contrast, size, rotation).
2. Subtraction (from the completed blank form).
3. Intersection of the "island" pattern of areas (where there should be inscriptions) with the result of item 2 and identifying filled and not filled fields.
4. Profit!

O
olamedia., 2015-04-07
@w999d

OCR

V
Vladimir B., 2015-04-07
@ange007

The easiest way I think is:

  1. Find the top of the document on the 1st line
  2. Aligning the document on the 1st line
  3. Determination of the presence of a "font of a different color" ( blue, red, green ) - in certain coordinates

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question