How to parse site content in a specific location?

V

vaneys12021-11-25 14:57:07

Python

vaneys1, 2021-11-25 14:57:07

I'm new in this area, I tried to search on Google, I found it, but something doesn't fit
There is a certain site, and I need to get the content of div class='....' (and it is in other similar divs), but what that does not quit
How to implement it to work?

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

N

Nickolay Tapokpy, 2021-11-25
@vaneys1

something like this)

import requests
from bs4 import BeautifulSoup

def parsing(url): # Свою страницу
    """
    Parsing URL to get product name, price, link
    :param url: connect object
    """
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3'} # что бы пропускали и не думали что бот

    response = requests.get(url, 'html.parser', headers=headers) # получение данных страницы

    soup = BeautifulSoup(response.text, features='html.parser') 
    category_site = soup.find_all('span', class_="inline-title") # пошел поиск по тэгам и классам
    all_items = soup.find_all('div', class_='porto-products wpb_content_element')# пошел поиск по тэгам и классам

    result = []
    for n, i in enumerate(category_site):
        db_category = i.text
        items = all_items[n].find_all('h3') 
        price = all_items[n].find_all('span', class_='woocommerce-Price-amount amount')
        link = all_items[n].find_all('a', class_='product-loop-title')

        for m, q in enumerate(items):
            db_item = q.text
            db_price = price[m].text.replace("руб.", "")
            db_url = link[m].get('href')
            db_list = (db_category, db_item, db_price, db_url)
            result.append(db_list)
    return result