P
P
PoCTo2013-01-11 20:01:19
Adobe
PoCTo, 2013-01-11 20:01:19

Change pdf text encoding

A pdf-file is given, made by latex+dvips+ps2pdf, the source is lost or hidden by the creators :)
If you select the text in adobe pdf reader and copy it somewhere, the cracks will be copied, for example "Ñòõàòòechåñêèé".
Due to poor encoding, it does not work, for example, search by file.
For some of the available files, using foxit instead of adobe for searching helps, but I want any reader to be able to read. It is required, apparently, to change the encoding of something inside (or outside?) pdf. I rummaged through the Internet, I did not find anything useful, although the problem seems to be not rare. Advise some software or sequence of actions to fix this. Any operating system.

Answer the question

In order to leave comments, you need to log in

7 answer(s)
P
PoCTo, 2013-01-11
@PoCTo

Found a solution for my cases:

gs -sDEVICE=pdfwrite -dNOPAUSE -dBATCH -dSAFER  -sOutputFile=output.pdf input.pdf

After that, output.pdf has the correct encoding in an incredible way.
On Windows, the executable must be named gsw32c, gswin32c, or also gs.

M
Mehabr, 2017-08-21
@Mehabr

The following helps me in this matter. I am using ctrl+p to create a new pdf using an Adobe PDF printer and copy Russian text from a pdf from which the Cyrillic alphabet is not copied.

F
Fyodor????, 2016-09-21
@loenkoff

I restored a file (TU from GOST with tables and formulas) with the same problem (without source codes and with format preservation) using https://finereaderonline.com/ - probably the same can be done with a regular FineReader. The result is exported to any text format.
(I understand that necroposting, but since I didn’t find a normal working answer on restoring without sources on the network, maybe this option will be useful to someone else)
In the online version, recognition of more than 11 pages is paid, but I think in the future with a similar problem, you can not sting (ABBYY sells page packages, it is not necessary to buy a program) for a good purpose. This time, 10 pages were enough for me - I marked the selective ones.

M
Mark Chigrin, 2021-02-11
@Black_and_green

I found a slightly more convenient way for ordinary people with Windows:
You need to export the entire file to PostScript (Encapsulated will create a file for each page - you need a regular one). And then reassemble the pdf.
I did this using Acrobat DC and Acrobat Distiller.

N
Nickel3000, 2013-01-11
@Nickel3000

You have text encoded in CP1252 (the word "Stochastic"). You need to extract all the text from the pdf, change the encoding and save to pdf again. Formatting is likely to be lost, I'm not special. Perhaps somehow you can convert the encoding in some pdf editor.

N
Nazar Mokrinsky, 2013-01-11
@nazarpc

If by phrases, then you can translate in the decoder , but to try the whole book somehow, too, nothing happened.

P
photovideomaster, 2014-05-14
@photovideomaster

File\Save as\select the format\click "Settings" on the right\if the selected format involves changing the encoding, select the desired one (usually UTF-8)\click Save\Waiting for it\Rejoicing

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question