S
S
Sergey2017-12-07 19:25:44
Python
Sergey, 2017-12-07 19:25:44

How to properly use unicode in python 2.7?

Something strange is happening, most likely due to the fact that I don’t understand something, so I’m asking for help:
Continuation of the story with journalctl of the ancient version, which in json format gives the following instead of Cyrillic:

"MESSAGE" : "2017-11-28 20:16:06.015  INFO 19853 --- [enerContainer-1] r.p.e.s.i.m.h.s.l.LocalMszPackageHandler : [MSG=d182a0ea-d45f-11e7-a390-7f0ad09e8f90] \uffffffd0\uffffffa1\uffffffd0\uffffffbe\uffffffd1\uffffff85\uffffffd1\uffffff80\uffffffd0\uffffffb0\uffffffd0\uffffffbd\uffffffd0\uffffffb5\uffffffd0\uffffffbd\uffffffd0\uffffffb8\uffffffd0\uffffffb5 \uffffffd0\uffffffbf\uffffffd0\uffffffb0\uffffffd0\uffffffba\uffffffd0\uffffffb5\uffffffd1\uffffff82\uffffffd0\uffffffb0"

By simple logical conclusions, we understand that re.sub('uffffff','x',line) already gives us normal UTF-8.
Further, by the method of magic crutches, we have the following:
>>> line2 = "\xd0\x9e\xd1\x82\xd0\xbf\xd1\x80\xd0\xb0\xd0\xb2\xd0\xba\xd0\xb0 \xd0\xbf\xd0\xbe\xd0\xb4\xd1\x82\xd0\xb2\xd0\xb5\xd1\x80\xd0\xb6\xd0\xb4\xd0\xb5\xd0\xbd\xd0\xb8\xd1\x8f \xd0\xbf\xd1\x83\xd0\xb1\xd0\xbb\xd0\xb8\xd0\xba\xd0\xb0\xd1\x86\xd0\xb8\xd0\xb8"
>>> unicode(line2)
u'\u041e\u0442\u043f\u0440\u0430\u0432\u043a\u0430 \u043f\u043e\u0434\u0442\u0432\u0435\u0440\u0436\u0434\u0435\u043d\u0438\u044f \u043f\u0443\u0431\u043b\u0438\u043a\u0430\u0446\u0438\u0438'
>>> lined = unicode(line2)
>>> print lined
Отправка подтверждения публикации

It would seem chic, we write test code:
...
line = re.sub('uffffff','x',line)
patterns_object = re.search('(((\\\\x[a-f,0-9]{2})+\s*)+)',line)	
if patterns is not None:                                       
    line2 = patterns.group(0)                  
    line3 = unicode(line2)
    print "line:" + line
    print "line3:" + line3

And we have at the output something a little reminiscent of what was above:
line:\xd0\x9e\xd1\x82\xd0\xbf\xd1\x80\xd0\xb0\xd0\xb2\xd0\xba\xd0\xb0 \xd0\xbf\xd0\xbe\xd0\xb4\xd1\x82\xd0\xb2\xd0\xb5\xd1\x80\xd0\xb6\xd0\xb4\xd0\xb5\xd0\xbd\xd0\xb8\xd1\x8f \xd0\xbf\xd1\x83\xd0\xb1\xd0\xbb\xd0\xb8\xd0\xba\xd0\xb0\xd1\x86\xd0\xb8\xd0\xb8 \xd0\xb2 \xd0\x9f\xd0\x9d\xd0\xa1\xd0\x98

line3:\xd0\x9e\xd1\x82\xd0\xbf\xd1\x80\xd0\xb0\xd0\xb2\xd0\xba\xd0\xb0 \xd0\xbf\xd0\xbe\xd0\xb4\xd1\x82\xd0\xb2\xd0\xb5\xd1\x80\xd0\xb6\xd0\xb4\xd0\xb5\xd0\xbd\xd0\xb8\xd1\x8f \xd0\xbf\xd1\x83\xd0\xb1\xd0\xbb\xd0\xb8\xd0\xba\xd0\xb0\xd1\x86\xd0\xb8\xd0\xb8 \xd0\xb2 \xd0\x9f\xd0\x9d\xd0\xa1\xd0\x98

I tried the found options like print(str(b)), print(unicode(b)), print(repr(b)), still the same.
Poke your nose at what I didn’t understand and what should I go to read?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
F
Fixid, 2017-12-07
@Kamikaze

Better just switch to python3, it's less painful

A
asd111, 2017-12-07
@asd111

Move to python 3. Because 2.7 won't be supported after 2020.
Numpy is no longer being updated for 2.7 and neither is django.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question