Answer the question
In order to leave comments, you need to log in
Using requests and encoding the resulting page - how to fix problems with Russian characters?
Dear friends.
The simplest program in Python 3.3.5 under Win7 x64
If you parse lenta.ru, then everything works as it should: anchors are shown in Russian letters.
But if you run the same thing on da.ru, then all Russian anchors turn out to be crooked.
Tell me how to fix it so that in all cases the Russian characters are normal?
import requests
from lxml import html
r = requests.get('http://lenta.ru')
#r = requests.get('http://da.ru')
docHtml = r.text
parsed_body = html.fromstring(docHtml)
for y in parsed_body.xpath("//a"):
url=y.get("href")
anchor=y.text
print(url,anchor)
Answer the question
In order to leave comments, you need to log in
I have the same problem, it helped to force the response encoding `request.get`
```
r = requests.get(link, timeout=60, verify=False, headers=headers)
r.encoding = 'utf-8'
print r.text # became pure
```
maybe a problem with unicode?
encoded = str.encode(original, 'utf-8')
print(encoded)
the short cuts look like
this
: \xc3\x91\xc2\x80\xc3\x90\xc2\xbe\xc3\x91\xc2\x82\xc3\x90\xc2\xbe\xc3\x91\xc2\x82\xc3\x90\xc2\xb8\xc3 \x90\xc2\xbf\xc3\x90\xc2\xb8\xc3\x91\xc2\x80\xc3\x90\xc2\xbe\xc3\x90\xc2\xb2\xc3\x90\xc2\xb0\xc3\x90 \xc2\xbd\xc3\x90\xc2\xb8\xc3\x90\xc2\xb5 \xc3\x91\xc2\x81\xc3\x90\xc2\xb0\xc3\x90\xc2\xb9\xc3\x91\xc2 \x82\xc3\x90\xc2\xb0'
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question