Answer the question
In order to leave comments, you need to log in
Problem with encoding when parsing a Russian site?
There is a problem with the encoding when parsing the site https://beton24.ru/sochi/beton/
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen('https://beton24.ru/sochi/beton/')
bs = BeautifulSoup(html.read())
result = bs.findAll("span", "catalog-index__link-text")[1]
parse = str(result)
Answer the question
In order to leave comments, you need to log in
We look at HTML through, for example, Chrome DevTools:
We read in the documentation for BeautifulSoup 4 (section "Entities" ):
>>> from urllib.request import urlopen
>>> from bs4 import BeautifulSoup
>>> html = urlopen('https://beton24.ru/sochi/beton/')
>>> bs = BeautifulSoup(html.read(), 'lxml')
>>> result = bs.findAll("span", "catalog-index__link-text")[1]
>>> result.text.replace(u'\xa0',' ').replace(u'\u2009', '')
'от 3836 ₽'
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question