Is it possible to recognize text in pdf with its embedding in the same pdf, is it possible for free, i.e. for nothing?

K

ks02019-05-30 18:31:21

Text recognising

ks0, 2019-05-30 18:31:21

There are a certain number of jpg files, the task is to first batch convert them to pdf. I don't think this will be a problem.
And then, without making any special efforts, recognize the text in pdf and embed it in files.
Then the files will be uploaded to the LogicalDoc electronic archive of the free edition, which parses text documents and can search for them, but, alas, cannot recognize text from a picture.

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

R

rPman, 2019-05-30
@rPman

Why does the text need to be edited and recognized in pdf and not earlier, in jpeg?
tesseract is an open and free set of utilities for text recognition, usually pre-manipulations are done with the image using filters or some other logic so that tesseract can recognize it (for example, if the image is not a scan but a photo of paper documents, it is necessary to remove light transitions and geometry distortions) .
ps ' without much effort ' - will not work