S
S
Stanislav Karpov2014-07-27 19:14:16
Python
Stanislav Karpov, 2014-07-27 19:14:16

How to write Russian text to UTF-8 file?

str_ru = 'строка'

with open('str_ru_text_1.txt', 'w') as f:
    f.write(str_ru)

with open('str_ru_text_2.txt', 'w') as f:
    f.write(str_ru.encode('utf-8').decode('utf-8'))

with open('str_ru_bytes.txt', 'wb') as f:
    f.write(str_ru.encode('utf-8'))

str_en = 'string'

with open('str_en_text_1.txt', 'w') as f:
    f.write(str_en)

with open('str_en_text_2.txt', 'w') as f:
    f.write(str_en.encode('utf-8').decode('utf-8'))

with open('str_en_bytes.txt', 'wb') as f:
    f.write(str_en.encode('utf-8'))

1. Why are the files 'str_ru_text_1.txt', 'str_ru_text_2.txt' in Windows 1251 encoding, and the files 'str_en_text_1.txt', 'str_en_text_2.txt' in UTF-8?
2. Is there a way to write utf-8 without str.encode('utf-8')?
3. Where are the rules for writing lines to a file described? Where and what to read on this issue?
Python 3.4, Windows 8.1
Demo in Sublime Text 2 (youtube)
Reverse action:
with open('str_ru_text_1.txt', 'rb') as f:
    print(f.read().decode('utf-8'))

with open('str_en_text_1.txt', 'rb') as f:
    print(f.read().decode('utf-8'))

Answer the question

In order to leave comments, you need to log in

1 answer(s)
L
lololololo, 2015-03-18
@stkrp

Comrades, this is some kind of scribe. They wanted the best, but it turned out even more through the ass.
https://docs.python.org/3/library/functions.html#open
1. If mode 'b' is not specified, then by default the file is considered text. You can only write bytes to a binary file, only Unicode to a text file.
(In text mode, the file is read only up to EOF ('\x1a'). How to combine reading to the end of the file and writing unicode to the file? But no way.)
2. If the encoding is not specified, locale.getpreferredencoding(False) is taken by default, i.e. .e. the execution result will depend on the axis settings! (for Windows - from the current locale). Fuck??? They got rid of some rakes, others acquired them.
In general, always explicitly specify the file encoding explicitly.

with open('str_ru_text_1.txt', 'w', encoding='utf-8') as f:

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question