Answer the question
In order to leave comments, you need to log in
How to encode a string with characters from different encodings?
>>> a='привет, '.encode('utf-8')
>>> b='мир!'.encode('cp1251')
>>> c=a+b
>>> c
b'\xd0\xbf\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82, \xec\xe8\xf0!'
Answer the question
In order to leave comments, you need to log in
I solved the problem through a crutch: since in my case the conflict was only with quotes "Christmas trees", I checked whether the bytes \xab and \xbb belong to the letters Y (\xd0\xab) and l (\xd0\xbb). If not, then replace it with a space.
text=bytes()
i=0
while i<=len(rawtext)-1:
if rawtext[i]==187 and rawtext[i-1]!=208:
text+=bytes([32])
elif rawtext[i]==171 and rawtext[i-1]!=208:
text+=bytes([32])
else:
text+=bytes([rawtext[i]])
i+=1
return(text.decode('utf-8', 'ignore'))
Can i ask you? Why do you need this?
Because anyway, this line will not be displayed anywhere * correctly. Because most software uses one encoding table for all content.
If there is a good reason for this, then keep the data in binary form and do not glue them together as strings.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question