Cyrillic is not recognized in Python. How to fix?

T

Tera4Byte2021-02-27 20:45:58

Python

Tera4Byte, 2021-02-27 20:45:58

Hello. I decided to parse the site " http://podolino-smolensky-khram.rf/raspisanie.html " as a test, but after parsing, this is displayed in the PyCharm console instead of the schedule text (a small piece of the displayed result):

6 Ð¼Ð°ÑÑÐ°(ÑÑÐ± Ð ± ðently ° °)
9:00
ð¾¶әµñ²² µ µ ° ° ° ° 19 ñ ñ³³ñññ ð ð ðññððð °ð½ðices

ð ð
ð

ðavyorth ”ñññ¿¿¿¿¿¿¿¿¿¿¿¿¿¿¿ town Ð ð¡¡ñ ñ ñ ð½ ññµ.ð ð ñ ñµìµðµð½ ðµ ð¾ñәµ¹ ð¼ñ quite ¾¾², ð¸ðµuzz ð² ð²²èðavyhod.

7 ° ÑÑÐ ° Ð¼Ð (Ð²Ð¾ÑÐºÑÐμÑÐμÐ½ÑÐμ)
9:00
ÐÐ¾Ð¶ÐμÑÑÐ²ÐμÐ½Ð½Ð ÐÐ¸ÑÑÑÐ³Ð¸ÑÐÐ¾Ð ° Ñ »ÐμÐ½ ÐμÐ ± Ñ Ð ° ÐºÐ ° ÑÐ¸ÑÑÐ¾Ð¼ Ð¸ÐºÐ¾Ð½Ðμ ÐÐ¾Ð¶Ð¸ÐμÐ¹ ÐÐ ° ÑÐμÑÐ¸ Â« ÐÑÐμÑÐ ° ÑÐ Process

finished with exit code 0

That is, the text, for some reason, is encoded, although I do not ask him about it. I would be very grateful if you explain what I'm doing wrong and how to fix it.

Here is the code itself:
---------------------------------------------- ------------------------------------
from base_of_bot import UserAgent, Accept
import requests as req
from bs4 import BeautifulSoup as bs

url_site = ' http://podolino-smolensky-khram.rf/raspisanie.html '
headers = {
"Accept": Accept,
"User-Agent": UserAgent
}

req_header = req.get( url_site, headers=headers)
src = req_header.text
soup = bs(src, 'lxml')
actual_schedule = soup. find("div", { 'class':'
print(actual_schedule)
---------------------------------------------- ------------------------------------------------

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

S

Sergey Karbivnichy, 2021-02-27
@Tera4Byte

After the line:
req_header = req.get(url_site, headers=headers)
add:
req_header.encoding = req_header.apparent_encoding
The names of the variables broke my brain.