How to get readable Russian text when parsing PDF using tabula-py?

M

malvin2018-08-10 15:54:48

Parsing

malvin, 2018-08-10 15:54:48

Windows 7 32.
All Russian letters in the saved file are replaced by ?????

import tabula


tabula.convert_into(r"C:\Code\Active\kartoteka\misc\ExampleExtract.pdf", r"C:\Code\Active\kartoteka\misc\output.csv", output_format="csv",pages = "all",java_options="-Dfile.encoding=utl-8")

The developer recommends this solution -

I got? character with result on Windows. How can I avoid it?
If the encoding of PDF is UTF-8, you should set chcp 65001 on your terminal before launching a Python process.
chcp 65001

I ran the command in smd - the same result.

Reply

Answer the question

In order to leave comments, you need to log in

0 answer(s)