H
H
hacker_programmer2020-06-08 10:26:39
Python
hacker_programmer, 2020-06-08 10:26:39

How to correctly parse a page with JS loading?

Hello everybody. I wrote a Python script that parses Yandex messenger (parses popular channels: )
Namely:
1. Channel name
2. Channel description
3. Link to the channel
4. Number of subscribers
5. Last publication date
6. Number of publications
7. Total number of views
8 The total number of reactions
He puts these data into a Google spreadsheet (attached a screenshot)
Please tell me, there is a problem in speed.
It is too fast, the number of publications, the total number of views and the total number of reactions simply do not have time to reset somewhere (while scrolling up) and display too large numbers (although there are much less there).
And sometimes js does not have time to load and 0 subscribers are displayed (although there are not 0).
I attached the script itself, I tried to describe everything in as much detail as possible with comments in the code.
Please tell me what is the best thing to do?
Somewhere write time.sleep (3) (but then the program will parse for a very long time)
or maybe you need to wait somewhere for js to load?
PS since the length of the question cannot be more than 10k, I removed the part where data is added to the table. Now they will just output in PyCharm

Code:

Так как на хабре нельзя прикладывать файл блокнота и публиковать вопросы длиною больше 10к, 
то полный код можно скопировать здесь:

https://ru.stackoverflow.com/questions/1138040/%d0...

Table output:
5edde82309142953102787.png

Answer the question

In order to leave comments, you need to log in

1 answer(s)
A
Artem, 2020-06-08
@b_a_y

Hello, I advise you to replace time.sleep with implicitly_wait. Allows you to pull the code further if the condition is met. Thus, you will move away from static expectations. And the code will work out much more efficiently.
I also recommend checking for a request code for a request (it will allow you to get rid of 404 errors and so on ..) in the case when you do not have data, I think Yandex services will recognize you as a bot and reject requests.
Further, before kicking the driver, you need to make a time-slip of your choice. driver.quit() - you need to put driver.close() after it. There were already similar problems on the selenium git hub - the scripts did not correctly complete their work without driver.quit(), driver.close(). The developers advised to use these two methods in conjunction

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question