I
I
Ivan Koryakin2021-01-10 12:01:32
Python
Ivan Koryakin, 2021-01-10 12:01:32

The parser parses only the first 4 OZON thermal pastes. Why?

Parses only the first 4 elements on each page, and there are a lot of them.

from bs4 import BeautifulSoup
import requests
import csv
import time

HOST = 'https://www.ozon.ru/'
URL = 'https://www.ozon.ru/category/termopasta-30799/'
HEADERS = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19041'
}


def get_html(url, params = 'params'):
    r = requests.get(url, headers = HEADERS, params = params)
    return r

def get_content(html):
    soup = BeautifulSoup(html, 'html.parser')
    items = soup.find_all('div', class_='a0c6 a0d a0c9 a0c8')

    cards = []


    for item in items:
        cards.append(
            {
            'title':item.find('a', class_ = 'a2g0 tile-hover-target').get_text(),
            'komment':item.find('a', class_ = 'a2g0 tile-hover-target').get('href')
            }
        )
    print(cards)
    return cards


def parser():
    PAGENATION = input('Укажите номер: ')
    PAGENATION = int(PAGENATION.strip())
    html = get_html(URL)
    if html.status_code == 200:
        cards = []
        for page in range(1, PAGENATION + 1):
            print(f'Парсим страницу:  {page}')
            html = get_html(URL, params={'page' : page})
            cards.extend(get_content(html.text))
        pass
    else:
        print('Error')

parser()

Answer the question

In order to leave comments, you need to log in

1 answer(s)
S
soremix, 2021-01-10
@SoreMix

Because the data is loaded using JS.
Press CTRL + U -> look for where the thermal paste is hidden in the code

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question