Answer the question
In order to leave comments, you need to log in
How to get readable Russian text when parsing PDF using tabula-py?
Windows 7 32.
All Russian letters in the saved file are replaced by ?????
import tabula
tabula.convert_into(r"C:\Code\Active\kartoteka\misc\ExampleExtract.pdf", r"C:\Code\Active\kartoteka\misc\output.csv", output_format="csv",pages = "all",java_options="-Dfile.encoding=utl-8")
I got? character with result on Windows. How can I avoid it?
If the encoding of PDF is UTF-8, you should set chcp 65001 on your terminal before launching a Python process.
chcp 65001
Answer the question
In order to leave comments, you need to log in
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question