How to recognize text in PDF and export data to csv?

V

Valentin Schmitt2019-07-03 19:20:44

Python

Valentin Schmitt, 2019-07-03 19:20:44

I categorically welcome, I ask you to direct me on the path, what tools and frameworks to use to optimize routine tasks, preferably using python at the amateur level or possibly other ready-made tools, where to read more.
Problem:
There are a huge number of digitized documents that need to be recognized (tried with TesseractOCR, unsuccessfully) automatically for certain fields and entered into the site-form without having access to the database.
Question:
I just ask you to acquaint with a similar experience how to recognize a document by tags, export it to from cvs / exel etc, and then to an html form.

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

D

Dimonchik, 2019-07-03
@dimonchik2013

export to from cvs/exel etc, and then to html form.

there is a lot of
this, but with a recognizer - if Pytesseract did not help, then see foxitsoftware - they have a tool for the command line,
or finereader - it seems they also have

D

Danil, 2019-07-03
@DanilBaibak

It turns out that something sneaky can be done in manual mode Google drive . Or you can automate using Google cloud .