Why is the page not being parsed?

Z

zlodiak2019-04-05 11:45:53

Python

zlodiak, 2019-04-05 11:45:53

I'm trying to get the html of this page:

#!/usr/bin/env python3

import requests
from bs4 import BeautifulSoup

def get_root_page_html(url: str) -> str:
    html = requests.get(url)
    return html.text

if __name__ == '__main__':
    root_page_html = get_root_page_html('https://hh.ru')
    print(root_page_html)

I run the script from the console, as a result I get the following output:

<html>
<head><title>404 Not Found</title></head>
<body bgcolor="white">
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</body>
</html>

Please tell me what I'm doing wrong.
First of all, I would like to understand if I am doing something wrong or is there a special protection against such simple parsers on the site?

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

A

alternativshik, 2019-04-05
@zlodiak

Adding the User-Agent header will help