T
T
Tera4Byte2021-02-27 20:45:58
Python
Tera4Byte, 2021-02-27 20:45:58

Cyrillic is not recognized in Python. How to fix?

Hello. I decided to parse the site " http://podolino-smolensky-khram.rf/raspisanie.html " as a test, but after parsing, this is displayed in the PyCharm console instead of the schedule text (a small piece of the displayed result):

6 маÑÑа(ÑÑб Ð ± ðently ° °)
9:00
ð¾¶әµñ²² µ µ ° ° ° ° 19 ñ ñ³³ñññ ð ð ðññððð °ð½ðices

ð ð
ð

ðavyorth ”ñññ¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿ town Ð ð¡¡ñ ñ ñ ð½ ññµ.ð ð ñ ñµìµðµð½ ðµ ð¾ñәµ¹ ð¼ñ quite ¾¾², ð¸ðµuzz ð² ð²²èðavyhod.

7 ° ÑÑÐ ° мР(воÑкÑÐμÑÐμнÑÐμ)
9:00
ÐожÐμÑÑвÐμннРÐиÑÑÑгиÑÐоР° Ñ »Ðμн ÐμÐ ± Ñ Ð ° кР° ÑиÑÑом иконÐμ ÐожиÐμй ÐÐ ° ÑÐμÑи « ÐÑÐμÑÐ ° ÑÐ Process

finished with exit code 0

That is, the text, for some reason, is encoded, although I do not ask him about it. I would be very grateful if you explain what I'm doing wrong and how to fix it.

Here is the code itself:
---------------------------------------------- ------------------------------------
from base_of_bot import UserAgent, Accept
import requests as req
from bs4 import BeautifulSoup as bs

url_site = ' http://podolino-smolensky-khram.rf/raspisanie.html '
headers = {
"Accept": Accept,
"User-Agent": UserAgent
}

req_header = req.get( url_site, headers=headers)
src = req_header.text
soup = bs(src, 'lxml')
actual_schedule = soup. find("div", { 'class':'
print(actual_schedule)
---------------------------------------------- ------------------------------------------------

Answer the question

In order to leave comments, you need to log in

1 answer(s)
S
Sergey Karbivnichy, 2021-02-27
@Tera4Byte

After the line:
req_header = req.get(url_site, headers=headers)
add:
req_header.encoding = req_header.apparent_encoding
The names of the variables broke my brain.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question