How to decode unicode escaped strings?

G

ghostku2017-05-03 02:30:50

Python

ghostku, 2017-05-03 02:30:50

How to decode unicode escaped strings?

When parsing with BeautifulSoup, sometimes I get lines containing characters like \uxxxx
For example:

element.text
>>> 'Плотность пленки \u2013 10 мкн'

Tried to decode: I get an error:
element.text.decode('unicode_escape')

AttributeError: 'str' object has no attribute 'decode'

How to properly decode such a string? Maybe you can tweak something in BeautifulSoup so that it initially produces such lines in a readable form?
Thanks

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

F

Foo Bar, 2017-05-03
@atomheart

First, in the console you see a string decoded to match the console encoding. This is not the same as how Python sees it from the inside (and it sees it most likely in unicode). Specifically, judging by the text, you have some kind of character encoded (an em dash, perhaps), which is not in the console encoding, so it displays its type.
Second, try searching for a substring inside your application to check that Python is processing it correctly (for example, find this character).
Thirdly, you have an error that the str type does not have a decode method, so try this:
str(element.text).decode('unicode_escape')

A

Alex F, 2017-05-03
@delvin-fil

element.encode().decode()
Film density - 10 microns