Why does not see the tag when parsing?

A

Aibot922021-06-20 15:08:34

Python

Aibot92, 2021-06-20 15:08:34

Good day everyone
I did the site parsing

import requests
from bs4 import BeautifulSoup
from selenium import webdriver
import csv
import os
from threading import *
import requests
from concurrent.futures import ThreadPoolExecutor, wait
from time import time,sleep


URL = 'https://www.dns-shop.ru/catalog/17a8a01d16404e77/smartfony/'
HEDARS = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0.3 Safari/605.1.15', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'}
FILE = 'dns.csv'
direct= os.getcwd()

def get_himl(url):
    chromedriver = direct + '/chromedriver'
    options = webdriver.ChromeOptions()
    browser = webdriver.Chrome(executable_path=chromedriver, chrome_options=options)
    browser.get(url)
    r = browser.page_source
    sleep(1)
    browser.quit()
    return r

def seve_file(item,path):
    with open(path, 'w', newline='',  encoding='utf-8') as file:
        writer = csv.writer(file, delimiter = ';')
        writer.writerow(['беренд', 'модель', 'цена', 'акции'])
        for items in item:
            writer.writerow([items['brand'], items['title'], items['prise'], items['sale']])


def get_content(html):
    soup = BeautifulSoup(html, 'html.parser')
    items = soup.find_all(class_='catalog-product ui-button-widget')
    for items in items:
        prise = items.find('div', class_='product-buy__price')
        if prise:
            prise = prise.get_text()
            prise = prise.replace('₽', ' ')
        else:
            prise = ''
        s = items.find('div', class_='vobler')
        if s:
            s = s.get_text()
        else:
            s = ''
        name = items.find('a', class_='catalog-product__name ui-link ui-link_black').get_text()
        name = name.split()
        start = 0
        stop = 0
        for i in range(len(name)):
            if name[i] == 'Смартфон':
                start = i
            if name[i] == 'ГБ':
                stop = i
        name = name[start + 1:stop + 1]
        name = ' '.join(name).lower()
        brand = name.split()[0]
        phone.append({
            'brand': brand,
            'title' : name,
            'prise' : prise,
            'sale' : s
        })
    return (phone)


def pars(num):
    a = URL + '?p=' + str(num)
    html = get_himl(a)
    phone.extend(get_content(html))
    seve_file(phone, FILE)


if __name__ == "__main__":
    phone = []
    with ThreadPoolExecutor() as executor:
        for num in range(1, 3):
            executor.submit(pars, num)
    print('Все=)')

But here's the problem

items = soup.find_all(class_='catalog-product ui-button-widget')
    for items in items:
        prise = items.find('div', class_='product-buy__price')

The result does not have this tag when parsing,
however, if you look at the page code, it is

for example
https://www.dns-shop.ru/search/?q=BQ+5047L+ looked completely at the html that selenium returns there is no given tag Tell me what's wrong?

<div class="product-buy__price">4 499 ₽</div>

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

F

FlashBoy, 2021-06-20
@Aibot92

There may be several options here:
1) time.sleep() comes after saving the page code, you need to rearrange it before saving (perhaps the page simply does not have time to load all JavaScript's).
2) You need to display information about the price using the (.text) method, and it is best to parse the price, and not the entire block (it is better to parse the product-buy__price using the find_all method, and then sort it out).
3) Perhaps your user agent does not fit the site or is outdated, even if not, it is better to use fake_useragent
4) A small (rather nitpicking) oversight is that from time you import time itself. What for?)