Why does encoding break in Python?

S

Sergey2021-05-30 21:04:40

Python

Sergey, 2021-05-30 21:04:40

I want to truncate a text file to 7 characters.
As long as the text file is in Latin, everything goes well:

#vsem privet
x=open("hello.txt","r+",encoding="utf-8")
x.truncate(7)
#vsem pr

However, if the file contains Cyrillic, then the remaining characters turn into a hodgepodge:

#всем привет
x=open("hello.txt","r+",encoding="utf-8")
x.truncate(7)
#все

What to do, how to be?

PS: the encoding of hello.txt and 1.py files is utf-8

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

S

Sergey Gornostaev, 2021-05-30
@Shull

Cyrillic characters in UTF-8 are encoded in two bytes.

Y

Yupiter7575, 2021-05-30
@yupiter7575

Well, for starters, files in Python are opened via the with ... as construct.
And if there are problems with the encoding, everyone runs to the deity named io.open

R

Romses Panagiotis, 2021-05-30
@romesses

Python3

with io.open("hello.txt", "r", encoding="utf-8") as f:
     s = f.read() # в строке будут декодированные кириллические символы
     print( s )
     print( s[:7] ) # используем срез