M
M
malvin2018-08-10 15:54:48
Parsing
malvin, 2018-08-10 15:54:48

How to get readable Russian text when parsing PDF using tabula-py?

Windows 7 32.
All Russian letters in the saved file are replaced by ?????

import tabula


tabula.convert_into(r"C:\Code\Active\kartoteka\misc\ExampleExtract.pdf", r"C:\Code\Active\kartoteka\misc\output.csv", output_format="csv",pages = "all",java_options="-Dfile.encoding=utl-8")

The developer recommends this solution -
I got? character with result on Windows. How can I avoid it?
If the encoding of PDF is UTF-8, you should set chcp 65001 on your terminal before launching a Python process.
chcp 65001

I ran the command in smd - the same result.

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question