Recognition of a scanned copy of a document on a website! Where to begin?

D

Dazz1e2015-04-07 16:38:49

PHP

Dazz1e, 2015-04-07 16:38:49

Hello,
The task is to create an automated filter for documents entering the online archive.
Known forms of documents are available.
How to read the document when uploading an image (scanned copy of a document) and if nothing is entered in the required fields (not even a scribble), then the document is not skipped, if there is something there, the document goes into the archive and a text file is created (log ) which fields (including required and optional) contain information?
You do not need to read the text, at least make sure that there is something there.
Blank:

Sample:

Please explain where to start to achieve the goal and in what direction to move!
Thank you in advance!

Reply

Answer the question

In order to leave comments, you need to log in

4 answer(s)

S

Sergey, 2015-04-07
@begemot_sun

With integration with ABBY services

X

xmoonlight, 2015-04-07
@xmoonlight

1. Combining a blank form with a filled one (cleaning, contrast, size, rotation).
2. Subtraction (from the completed blank form).
3. Intersection of the "island" pattern of areas (where there should be inscriptions) with the result of item 2 and identifying filled and not filled fields.
4. Profit!

O

olamedia., 2015-04-07
@w999d

OCR

V

Vladimir B., 2015-04-07
@ange007

The easiest way I think is:

Find the top of the document on the 1st line
Aligning the document on the 1st line
Determination of the presence of a "font of a different color" ( blue, red, green ) - in certain coordinates