How to parse a site that loads information later?

A

Alexander Kovalenko2021-02-27 23:10:50

Python

Alexander Kovalenko, 2021-02-27 23:10:50

Hello everyone, you need to parse the site https://znanija.com/ so that for any question it sends the number of results and answer options in response, but the problem is that when you go to the site, loading occurs and in order to receive information, a delay is needed for the site to load

import requests
from bs4 import BeautifulSoup
from time import sleep

URL = 'https://znanija.com/app/ask?entry=hero&q=%D0%BE%D0%B1%D1%8C%D0%B5%D0%BC+%D0%BA%D1%83%D0%B1%D0%B0'
header = {'user-agent':
              'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36'}
sleep(2)
page = requests.get(URL, headers=header)
soup = BeautifulSoup(page.content, 'html.parser')
req = soup.find('span', {'class':'sg-text'})
print(req)

result

<span class="sg-text">
Поиск...
</span>

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

S

soremix, 2021-02-27
@KovalenkoA12

need a delay for the site to load

It doesn't work that way. The data is loaded dynamically with additional background queries.
Open the developer tools, the network tab and look for the request you need in XHR. Then you repeat it through python
Spoiler: here it is

POST to https://znanija.com/graphql/ru