Problem with encoding in requests

N

nexus02020-01-11 04:41:36

Python

nexus0, 2020-01-11 04:41:36

Problem with encoding in requests_html?

Unable to parse site header in correct encoding.

>>> from requests_html import HTMLSession
>>> session = HTMLSession()
>>> r = session.get('https://pm.by/live.html')
>>> print(r.encoding)
WINDOWS-1251
>>> r.html.xpath('//title/text()')
['������ Live � ������ �� ����� ���� (�� ���� �����): �� ��������']

The site has cp1251 encoding, when I make an xpath request I get bugs.
Krakozyabry do not want to distill even in bytes, using the encode method.

>>> r.html.xpath('//title/text()')[0].encode('cp1251')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.8/encodings/cp1251.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-5: character maps to <undefined>

What could be the problem?

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

D

Drill, 2020-01-11
@nexus0

Try orr.content.decode('cp1251')
r.html.encoding = 'cp1251'