G
G
goormany2021-07-30 04:29:33
Python
goormany, 2021-07-30 04:29:33

Why is html different?

Hello, such a problem: I want to write a Python parser for https://csgopolygon.gg . But here is such a situation that when viewing the "Element Code" I see this value 610354f200af8851573881.png

And when viewing the "Source Code" I see these numbers and lines 61035548aefed278753731.png

When parsing, I get the same values ​​​​as when viewing the "Source Code of the Page". Here is the question, what should I do to get the actual data?
Here is my code:

from bs4 import BeautifulSoup
import requests

URL = 'https://csgopolygon.gg'
HEADERS = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.164 Safari/537.36 OPR/77.0.4054.275 (Edition Yx GX)'
}

def get_html(url):
    r = requests.get(url, headers=HEADERS)
    return r

def get_content(html):
    soup = BeautifulSoup(html, 'html.parser')
    items = soup.find_all('ul', class_='balls')
    balls = []

    for tooks in items:
        for i in tooks:
            balls.append({
                'num': soup.find('li', class_='ball').get_text(strip=True),
            })
            print(i)


def parse():
    html = get_html(URL)
    if html.status_code == 200:
        get_content(html.text)
    else:
        print(html.status_code)

parse()

Answer the question

In order to leave comments, you need to log in

3 answer(s)
S
soremix, 2021-07-30
@SoreMix

Either Selenium, or open F12 - Network and see which queries return the desired values ​​and repeat them. Although most likely there will be not just xhr

U
UberPool, 2021-07-30
@UberPool

It is quite possible that all html is rendered using JS, try selenium and the like.

P
Pavel Dunaev, 2021-07-30
@Pasha13666

And in my source code mode, and in the inspector, and even on the page itself, the numbers are the same as in your source code. I will assume that it's all about authorization, or rather, that the numbers on the page that the server gives (source code) are just a stub, and the real data is transmitted later via js / xhr. Look in the inspector on the "network" tab for the necessary requests and try to repeat them.
The second option - selenium - is essentially a full-fledged browser that will be controlled from your program.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question