How to slow down the parsing process on SELENIUM?

D

dancer_and_programmer2020-06-09 22:08:29

Python

dancer_and_programmer, 2020-06-09 22:08:29

Good afternoon everyone. Please tell me, there is a parser in Python, using selenium (webdriver) it parses the Yandex messenger site (it is in JS): https://yandex.ru/chat/#/chat
, namely popular channels. He clicks on each channel and scrolls up (all the way to CHANNEL CREATED) to count the number of posts and the total number of views under all posts. But since it scrolls too fast (with this line:

for i in range(200):
    driver.execute_script("var evt = document.createEvent('MouseEvents');evt.initEvent('wheel', true, true);evt.deltaY = -100000;document.querySelector('.yamb-conversation__content').dispatchEvent(evt);")
    html2 = driver.page_source
    soup2 = BeautifulSoup(html2, 'lxml')
    time.sleep(2) # после одного прокрута (это 5-6 публикаций) сделать паузу 2 секунды, но такой способ не помогает

Then he does not have time to load all the publications (because JS does not have time to load). Tell me how you can slow down, or maybe there is another way to wait for js to load on the page?

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

D

Dmitry, 2020-06-10
@LazyTalent

It's better to do this:

for i in range(200):
    driver.execute_script("var evt = document.createEvent('MouseEvents');evt.initEvent('wheel', true, true);evt.deltaY = -100000;document.querySelector('.yamb-conversation__content').dispatchEvent(evt);")
    time.sleep(2)
    html2 = driver.page_source
    soup2 = BeautifulSoup(html2, 'lxml')

In your version, you scroll, read the content of the page, and only then pause. What for?
In my version, you first scroll, pause (at this time the content has time to load) and only then read the page content.
And a link for general development: https://selenium-python.readthedocs.io/waits.html