R
R
R4ndolphC4rter2020-01-29 20:50:24
Bots
R4ndolphC4rter, 2020-01-29 20:50:24

How to correctly parse site data that is loaded dynamically using Selenium?

Hello! I set a goal
for myself : to write a program that would download videos all / selectively from any (search is carried out by login) tikitok account. I divided this task into subtasks. And with one of the subtasks (and it is the most important) there was a problem. Unable to programmatically load site content . For example, I took a random popular account https://www.tiktok.com/@egorkreed . At first I tried to get an html page using the requests library in conjunction with bs4 . I realized that this method is not suitable. The page is generated dynamically. Decided to use the Selenium library . Subtask code:



spoiler
Now I have simplified the code as much as possible

import time
from selenium import webdriver

URL = 'https://www.tiktok.com/@egorkreed'


def get_html(url):
    driver = webdriver.Chrome()
    driver.get(url)

get_html(URL)


This code opens the page:
5e31e14f47942265632633.png

Result : An endless attempt to download the video.

But, if I go to the same link through the browser manually, the result will be the following :
5e31e1b7dc93d150630277.png

I don't understand what the problem is.
Question:
How to correctly get the html page code using Selenium so that the content (clips) is displayed?

PS
I tried to use the code adapted for my program from the documentation example with expectation
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Firefox()
driver.get("http://somedomain/url_that_delays_loading")
try:
    element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.ID, "myDynamicElement"))
    )
finally:
    driver.quit()


Whatever timeout (10, 50, 100...) I set, the result of the program is an exception. The required element was not found.

Answer the question

In order to leave comments, you need to log in

4 answer(s)
R
riot26, 2019-05-06
@HappyMen

With each message, the bot receives a user object that has an identifier - the 'id' field.

R
R4ndolphC4rter, 2020-01-29
@R4ndolphC4rter

Blocks (containing video) are not loaded programmatically

<div class="jsx-1410658769 video-feed-item">
...
</div>

What is causing the problem? How could it be corrected?
02/02/2020 (palindrome date, by the way)
After digging around the site, I found that the video id I needed, as well as additional information about the video, comes in json format (in a pack of 30 pieces) by requests get request to the address
https://m.tiktok.com/share/item/list?secUid=MS4wLjABAAAAel1W8SHY_s5E-E8fS9SFwEGKTV4TqtP-GotZf737nudl9M5gm99Pk_8bp8A0UXS8&id=6568346904743116806&type=1&count=30&minCursor=0&maxCursor=0&shareUid=&lang=&_signature=N5.bMAAgEBaTTMphzSDYUTef2iAAGmv

Various parameters are passed in this link.
The important ones are:
&maxCursor=N
&_signature=LONG_STRING
If you make a request without the correct signature, then the json file will be, roughly speaking, empty. No relevant information.
So now there is another question. Namely: how to fake a tiktok signature?
But this question does not apply to this topic, and therefore (and not only) I consider this topic closed. Thanks to everyone who helped.

S
sergey, 2020-01-30
kuzmin @sergueik

R4ndolphC4rter wondering what tiktok does in the browser what selenium doesn't do -

window.addEventListener('load', function() {
                            navigator.serviceWorker.register('/sw.js');
                        });

good luck
BTW R4ndolphC4rter via java with Selenium 3.14, pt. old FF - 40 - video loads and works without any tuning at all:
Video link selected: https://www.tiktok.com/@egorkreed/video/678***************5
Video link selected: https://www.tiktok.com/@egorkreed/video/678***************7
Video link selected: https://www.tiktok.com/@egorkreed/video/678***************4
...

but with Chrom - ohm - no (the same situation as through python)

D
Dr. Bacon, 2020-01-29
@bacon

With Selenium bs4 is not needed. Read already the docks how to search there by DOM.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question