D
D
Daidin2021-12-10 21:18:31
Parsing
Daidin, 2021-12-10 21:18:31

I don't understand how to parse this product?

import requests
from bs4 import BeautifulSoup

URL = 'https://www.bershka.com/by/%D0%BC%D1%83%D0%B6%D1%87%D0%B8%D0%BD%D1%8B/%D0%BE%D0%B4%D0%B5%D0%B6%D0%B4%D0%B0/%D1%82%D0%BE%D0%BB%D1%81%D1%82%D0%BE%D0%B2%D0%BA%D0%B8-c1010193244.html'

HEADERS = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36',
           'accept': '*/*'}
def get_html(url, params=None):
    r = requests.get(url, headers=HEADERS, params=params)
    return r


def get_content(html):
    soup = BeautifulSoup(html, 'html.parser')
    items = soup.find_all('p', class_='product-content')
    print(items)

def parse():
    html = get_html(URL)
    if html.status_code == 200:
        get_content(html.text)
    else:
        print("Error")

parse()


I don’t really understand parsing yet, so I ask you to give an answer in simple language)
And so there is a site and I need to parse prices, pictures and the name of the product, I don’t understand. In the code above, I tried to parse at least one "card". But here's something that doesn't work at all, as I just haven't tried it. In the code, as I understand it, all the cards are in the ul list, and each in li. And now I don’t understand how to parse this information I can’t find it anywhere)

Answer the question

In order to leave comments, you need to log in

1 answer(s)
S
ScriptKiddo, 2021-12-10
@ScriptKiddo

Product information is loaded through a separate request
Here is an example

import requests
import json
import pprint

headers = {
    'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.55 Safari/537.36',
}

params = (
    ('categoryId', '1010193244'),
    ('productIds',
     '103733907,105019717,103807780,103646046,103789895,103789901,103494687,103807800,103586862,104787564,104787563,103678178,104787606,104787560,104787562,103646051,103994120,104131075,103056467,103588023,103921817,103554672,103921816,103101531,103284042,103101528,103284041,104787703,105019580,105019579,104787704,103108154,103376867,103760889,102948033,103494674,104130553,103215061,102944627,103293571'),
    ('languageId', '-20'),
)

response = requests.get('https://www.bershka.com/itxrest/3/catalog/store/45009591/40259536/productsArray',
                        headers=headers, params=params)

data = json.loads(response.text)

pp = pprint.PrettyPrinter(indent=4)
pp.pprint(data['products'][0])

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question