V
V
Vladislav Lidyaev2020-04-18 20:44:48
Python
Vladislav Lidyaev, 2020-04-18 20:44:48

Parsing Yandex.Market, problem with changing the page?

Hey!
I need the largest possible list of AsRock motherboards, I decided to take Yandex.Market as a source. Faced with a ban + a very strange thing, after page 7 inclusive, 33 cards with a product are issued, although in the settings it is still "Show 48". When moving to page 7 and further, it does not change the Page parameter of the page, so it copies the same cards to the .csv file (everything works ok from pages 1 to 6). Tell me what exactly is the problem?

(I use TOR, change ip every 10 seconds, so I use time.sleep(10)) I'm

new to Python, if you find the reason, please help and explain in as much detail as possible, thanks!

import requests
from bs4 import BeautifulSoup
import time
import socks
import socket
import csv

socks.set_default_proxy(socks.SOCKS5, "localhost", 9150)
socket.socket = socks.socksocket

URL = 'https://market.yandex.ru/catalog--materinskie-platy/55323/list?hid=91020&page=1&glfilter=4923257%3A12108404%2C12108414&glfilter=7774847%3A1&glfilter=7893318%3A762104&onstock=0&local-offers-first=0'
HEADERS = {'User-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:75.0) Gecko/20100101 Firefox/75.0','Accept': '*/*'}

HOST = 'https://market.yandex.ru'

FILE = 'AsRock.csv'

def get_html(url, params=None):
    r = requests.get(url, headers=HEADERS, params=params)
    return r

def save_files(items, path):
    with open(path, 'w', newline='') as file:
        writer = csv.writer(file, delimiter=';')
        writer.writerow(['NAME', 'LINK'])
        for item in items:
            writer.writerow([item['title'], item['link']])

def str_subtract(s1, s2):
    for ch in s2:
        if ch in s1:
            s1 = s1.replace(ch, '', 1)
    return s1

def get_content(html):
 try:
    plate = []
    soup = BeautifulSoup(html,'html.parser')
    items = soup.find_all('div', class_='n-snippet-card2__part n-snippet-card2__part_type_center')
    for item in items:
        plate.append({
            'title': str_subtract(item.find('h3', class_='n-snippet-card2__title').get_text(strip=True),'Материнская плата ASRock '),
            'link': HOST + item.find('a', class_='link').get('href')
            })
    return(plate)
 except AttributeError:
    return False
 return(title)
title = ("-")

def parse():
    html = get_html(URL)
    if html.status_code == 200:
        plate = []
        for page1 in range(1, 28):
            print(f'Парсинг страницы {page1} из 27...')
            if (page1 < 7):
                while (len(plate) < (page1)*48):
                    html = get_html(URL, params={'page': page1})
                    plate.extend(get_content(html.text))
                    print(len(plate))
                    time.sleep(10)
            else:
                while (len(plate) < 288+((page1-6)*33)):
                    html = get_html(URL, params={'page': page1})
                    plate.extend(get_content(html.text))
                    print(len(plate))
                    time.sleep(10)
            save_files(plate, FILE)
    else:
        print('Error')

parse()

Answer the question

In order to leave comments, you need to log in

3 answer(s)
S
Sergey Grebennikov, 2020-04-18
@SergeyGrebennikov

a very strange thing, after page 7 inclusive, 33 cards with a product are issued, although in the settings it is still set to "Show 48"

And it’s weak to look manually and find out the fact that the Market finds information on 6 full (48 products each) pages, and on page 7 it finds only 33 products. Well, there is no more information on the Market. Naturally, when you request to show page 8, it automatically goes to the last, 7th, page with products.

A
Alexey Gorbunov, 2020-04-23
@leha_gorbunov

Why reinvent the wheel? Need to see a list of motherboards from Yandex.Market with page addresses?
See ymscanner.com/items?hid=91020&brand=762104

A
adminfreeall, 2021-11-19
@adminfreeall

You can also view and download in the ApiSystem service , add test requests to the support, see for yourself.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question