Is it possible to programmatically determine the font of text in a pdf?

V

Viktor Dubrov2019-12-26 12:29:16

Programming

Viktor Dubrov, 2019-12-26 12:29:16

I want to try to translate the book with the help of Google translator, recently he began to translate very well! As we all know in books, regular text differs from everything (identifiers, code, terms...) in a certain font.
So I thought why not translate books? The choice of PL differs between C and Py
If anyone has any thoughts on this, you will be very helpful in moving the issue forward)

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

C

carakan, 2020-12-30
@victor1985

Pdf is quite a complicated thing. There are several layers. In particular, there are text and graphics. The text can be here and there. In especially difficult cases at the same time. The text layer, of course, stores font information. He is not chosen randomly. The text from the image can only be recognized using ocr. All solutions known to me with acceptable recognition quality are proprietary.

S

SOTVM, 2019-12-26
@sotvm

text font has nothing to do with this

1

12rbah, 2019-12-26
@12rbah

In python, you can select text using libraries for working with pdf, save it in txt and upload it to the translator.