D
D
Danil Neskazhu2020-04-24 19:41:50
css
Danil Neskazhu, 2020-04-24 19:41:50

BS does not see classes when parsing?

I'm trying to parse a website using BS4 and requests.
From a site with a request to a search engine: https://steambuy.com/catalog/?q=PUBG
You need to get a link to page 1 of the search result. But for some reason BS does not see the right class?
What is wrong with the code or with the site?

import requests
from bs4 import BeautifulSoup as BS 
game = 'PUBG'
START='https://steambuy.com/catalog/?q='
HOST='https://steambuy.com'
URL=START+game
def get_html(url):
    search= requests.get(url)
    return search


def get_content(html):
    soup = BS(html,'html.parser')
    item = soup.find('a',class_='product-item__title-link').get('href')
    return item

def parse(url):
    html= get_html(url)
    if html.status_code == 200:
        a = get_content(html.text)
        print(a)
    else:
        pass
parse(URL)

Answer the question

In order to leave comments, you need to log in

2 answer(s)
S
Sergey Karbivnichy, 2020-04-24
@hottabxp

Added:

import requests
import json
from bs4 import BeautifulSoup as BS 

headers = {'X-Requested-With':'XMLHttpRequest'}
url = 'https://steambuy.com/ajax/_get.php?rnd=0.7101602294952999&offset=0&region_free=0&sort=cnt_sell&sortMode=descendant&view=extended&a=getcat&q=PUBG&series=&izdatel=&currency=wmr&curr=&currMaxSumm%5Bwmr%5D=3000&currMaxSumm%5Bwmz%5D=100&currMaxSumm%5Bwme%5D=70&currMaxSumm%5Bwmu%5D=1000&letter=&limit=0&page=1&minPrice=0&maxPrice=5000&minDate=0&maxDate=0&deleted=0&no_price_range=0&records=14'

response = requests.get(url,headers=headers)
soup = BS(json.loads(response.text)['html'],'html.parser')
items = soup.find_all('a',class_='product-item__title-link')
for item in items:
  print('https://steambuy.com'+item.get('href'))

At the exit:
https://steambuy.com/steam/playerunknown-s-battleg...
https://steambuy.com/steam/playerunknowns-battlegr...

Of course, you can’t find it, because the results are loaded from a different address:
https://steambuy.com/ajax/_get.php+many parameters

5ea31a5f2854d865027267.png

A
Amigun, 2020-04-24
@Amigun

I can not confirm, but perhaps the data that you need is added to the page dynamically.
The requests library receives only the page code, and cannot work with dynamic data.
How to check it? Through the same requests, download the requested page and study it.
If my theory is correct, then look towards selenium.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question