Answer the question
In order to leave comments, you need to log in
How does text recognition work?
There are documents (invoices) in which there is a header (not needed for recognition) and directly data with the product, quantity and measure.
"Item1 10 boxes
Item2 15 pieces
Item3 200 balloons".
I want to recognize them, given that I have a list of these products (i.e. I can compare what I recognized with the base."
How does it work at all (I did not come across). I need to use some tool to select the lines in which there is a product, quantity, measurement measure (they always go in one line), then recognize it with Tesseract in one line and from there, using the product database that I have, pull out the name product, select what follows it as a quantity and then after the quantity select a measure of measurement? And what tool can be used to highlight lines in an image? they can be in different places in the document. For the first time, I approached text recognition for the first time, tried Tesseract, well, it recognizes something, I can generate recognition models myself using Image from php in different fonts, with errors, etc. (well, this is if you need to use some kind of neuron for this). Where to start something?
Answer the question
In order to leave comments, you need to log in
Where to start then?)
Read about document layout analysis . Regarding Tesseract support: tesseract-ocr.github.io/docs/das_tutorial2016/5Lay...
Python library: https://gitlab.gnome.org/World/OpenPaperwork/pyocr
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question