Answer the question
In order to leave comments, you need to log in
Cyrillic is not recognized in Python. How to fix?
Hello. I decided to parse the site " http://podolino-smolensky-khram.rf/raspisanie.html " as a test, but after parsing, this is displayed in the PyCharm console instead of the schedule text (a small piece of the displayed result):
6 маÑÑа(ÑÑб Ð ± ðently ° °)
9:00
ð¾¶әµñ²² µ µ ° ° ° ° 19 ñ ñ³³ñññ ð ð ðññððð °ð½ðices
ð ð
ð
ðavyorth ”ñññ¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿ town Ð ð¡¡ñ ñ ñ ð½ ññµ.ð ð ñ ñµìµðµð½ ðµ ð¾ñәµ¹ ð¼ñ quite ¾¾², ð¸ðµuzz ð² ð²²èðavyhod.
7 ° ÑÑÐ ° мР(воÑкÑÐμÑÐμнÑÐμ)
9:00
ÐожÐμÑÑвÐμннРÐиÑÑÑгиÑÐоР° Ñ »Ðμн ÐμÐ ± Ñ Ð ° кР° ÑиÑÑом иконÐμ ÐожиÐμй ÐÐ ° ÑÐμÑи « ÐÑÐμÑÐ ° ÑÐ Process
finished with exit code 0
That is, the text, for some reason, is encoded, although I do not ask him about it. I would be very grateful if you explain what I'm doing wrong and how to fix it.
Here is the code itself:
---------------------------------------------- ------------------------------------
from base_of_bot import UserAgent, Accept
import requests as req
from bs4 import BeautifulSoup as bs
url_site = ' http://podolino-smolensky-khram.rf/raspisanie.html '
headers = {
"Accept": Accept,
"User-Agent": UserAgent
}
req_header = req.get( url_site, headers=headers)
src = req_header.text
soup = bs(src, 'lxml')
actual_schedule = soup. find("div", { 'class':'
print(actual_schedule)
---------------------------------------------- ------------------------------------------------
Answer the question
In order to leave comments, you need to log in
After the line:
req_header = req.get(url_site, headers=headers)
add:
req_header.encoding = req_header.apparent_encoding
The names of the variables broke my brain.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question