A
A
Alexander Kovalenko2021-02-27 23:10:50
Python
Alexander Kovalenko, 2021-02-27 23:10:50

How to parse a site that loads information later?

Hello everyone, you need to parse the site https://znanija.com/ so that for any question it sends the number of results and answer options in response, but the problem is that when you go to the site, loading occurs and in order to receive information, a delay is needed for the site to load

import requests
from bs4 import BeautifulSoup
from time import sleep

URL = 'https://znanija.com/app/ask?entry=hero&q=%D0%BE%D0%B1%D1%8C%D0%B5%D0%BC+%D0%BA%D1%83%D0%B1%D0%B0'
header = {'user-agent':
              'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36'}
sleep(2)
page = requests.get(URL, headers=header)
soup = BeautifulSoup(page.content, 'html.parser')
req = soup.find('span', {'class':'sg-text'})
print(req)


result
<span class="sg-text">
Поиск...
</span>

Answer the question

In order to leave comments, you need to log in

1 answer(s)
S
soremix, 2021-02-27
@KovalenkoA12

need a delay for the site to load

It doesn't work that way. The data is loaded dynamically with additional background queries.
Open the developer tools, the network tab and look for the request you need in XHR. Then you repeat it through python
Spoiler: here it is
603aa904a2b80968033093.jpeg
POST to https://znanija.com/graphql/ru

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question