I can not understand why the parser does not work?

S

Samanta-Smith2020-02-01 18:36:32

Python

Samanta-Smith, 2020-02-01 18:36:32

Sorry for the question. I'm trying to write a simple bs parser that will extract links from the search results, and I'm running into some difficulties. When I parse all links on the page through the following code, I get some result (until Yandex starts sending captcha).

from bs4 import BeautifulSoup
from bs4.dammit import EncodingDetector
import requests
from fake_useragent import UserAgent
UserAgent().chrome

meme_page = 'https://www.yandex.ru/search/?text=%D0%9F%D0%BB%D1%8F%D0%B6%20%D0%B4%D0%BB%D1%8F%20%D0%BD%D1%83%D0%B4%D0%B8%D1%81%D1%82%D0%BE%D0%B2%20%D0%B2%20%D0%BC%D0%BE%D1%81%D0%BA%D0%B2%D0%B5&lr=213/'
response = requests.get(meme_page, headers={'User-Agent': UserAgent().chrome})
html = response.content
soup = BeautifulSoup(html, 'html.parser')

for link in soup.findAll('a',   href=True):
    print(link['href'])

But if I try to access links from the search results, I get empty.

from bs4 import BeautifulSoup
from bs4.dammit import EncodingDetector
import requests
from fake_useragent import UserAgent
UserAgent().chrome

meme_page = 'https://www.yandex.ru/search/?text=%D0%9F%D0%BB%D1%8F%D0%B6%20%D0%B4%D0%BB%D1%8F%20%D0%BD%D1%83%D0%B4%D0%B8%D1%81%D1%82%D0%BE%D0%B2%20%D0%B2%20%D0%BC%D0%BE%D1%81%D0%BA%D0%B2%D0%B5&lr=213/'
response = requests.get(meme_page, headers={'User-Agent': UserAgent().chrome})
html = response.content
soup = BeautifulSoup(html, 'html.parser')

for link in soup.findAll('a', {'class':'link link_theme_outer path__item i-bem link_js_inited'},   href=True):
    print(link['href'])

Why is this happening? Where is it messed up and how to fix it? Thank you.

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

S

Sergey Tikhonov, 2020-02-02
@Samanta-Smith

Because the output is generated by JavaScript. Use selenium.