A
A
Alexander Kovalenko2021-03-20 14:38:27
Python
Alexander Kovalenko, 2021-03-20 14:38:27

How to parse the number of ads from a seller on olx?

The task is to parse olx search pages, get product names - price, parse ads and get links to sellers profiles and parse the number of ads using the sellers link

from bs4 import BeautifulSoup
import requests

URL = 'https://www.olx.pt/tecnologia-e-informatica/'
HEADERS = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
}

offer = []
user_link = []
offer_in_user = []


def parse():
    response = requests.get(URL, headers=HEADERS)
    soup = BeautifulSoup(response.content, 'html.parser')
    # получаем все "офферы" на странице
    items = soup.findAll('div', class_='offer-wrapper')
    #

    for item in items:
        # исключаем обьявления которые с олх доставкой
        olx_ship = item.find('span', class_='promo-label promo-label--ctt inlblk rel')
        if olx_ship:
            pass
        else:
            try:

                title = item.find('a', class_='marginright5 link linkWithHash detailsLink').get_text(
                    strip=True)  # запись названия
                price = item.find('p', class_='price').get_text(strip=True)  # запись  цены
                link = item.find('a', class_='marginright5 link linkWithHash detailsLink').get('href')  # запись ссылки
                city = item.find('small', class_='breadcrumb x-normal').find_next('span').get_text(
                    strip=True)  # запись города
                comps = {
                    'title': title,
                    'price': price,
                    'link': link,
                    'city': city,
                }
                offer.append(comps)
            except:
                pass
    # получение ссылок на пользователя
    for user in offer:
        r = requests.get(user['link'], headers=HEADERS)
        soup = BeautifulSoup(r.text, 'html.parser')

        try:
            # ищем ссылку на профиль
            userx = soup.find('a', class_='userbox__image-link').get('href')
            users = {
                'user': userx
            }
            user_link.append(users)
        except:
            users = {
                'user': ''
            }
            user_link.append(users)


parse()


here is the code that parses all this except for the number of ads of the seller, when parsing it either gives an error (probably a lot of requests from one ip) or gives the number of ads in total from all sellers, and not separately for each

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question