How to parse steam prices/link?

S

s1veme2020-07-31 11:20:58

Python

s1veme, 2020-07-31 11:20:58

hello world!

I started parsing steam, a problem arose, as well as a huge misunderstanding.

The code:

from bs4 import BeautifulSoup
import requests

headers = {

    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'Accept-Language': 'ru-UA,ru;q=0.9,en-US;q=0.8,en;q=0.7,ru-RU;q=0.6',
}
steam_link = ('https://steamcommunity.com/market/listings/730/AK-47%20%7C%20Phantom%20Disruptor%20%28Field-Tested%29')
print(steam_link)
full_page = requests.get(steam_link, headers=headers)
soup = BeautifulSoup(full_page.content, 'html.parser')

skins = soup.find_all('div', class_='market_listing_row')
print(skins)

for skin in skins:
  name = skin.find('span',class_='market_listing_row ').text
  counts = skin.find('span',class_='market_listing_num_listings_qty').text
  price = skin.find('span',class_='sale_price').text.replace('От','').strip() #HACK
  print(name, counts, price)

Link to what I'm parsing:
https://steamcommunity.com/market/listings/730/AK-...

How to parse price + title + 'Buy' link?
Thank you very much for your answer :)

(I decided to master parsing a little, but something got stuck)

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

D

Dmitry, 2020-07-31
@aleksegolubev

This block is updated using js, so you need to look at the requests that the browser makes.
Open the developer tools in your browser and look at the Network tab for XHR requests. There will be requests to https://steamcommunity.com/market/itemordershistogram , which return a json with the data you need.

Request URL: https://steamcommunity.com/market/itemordershistogram?country=RU&language=russian&currency=1&item_nameid=176118358&two_factor=0

country, language, currency - unchanged
item_nameid is in the page source, can be obtained using a regular expression:

import re
result = re.findall(r'Market_LoadOrderSpread\(\s*(\d+)\s*\)', str(full_page.content))
print (result[0])

After that, just make a request with requests and parse the json