How to make the parser output only text and links without Html markup?

B

Bogdan Romanov2021-08-13 17:51:30

Python

Bogdan Romanov, 2021-08-13 17:51:30

Apologies in advance for the shitty code, I'm just getting started :)

import requests
import bs4
import lxml

url = '*page_link*'
r = requests.get(url=url)
soup = bs4.BeautifulSoup(r.text, 'lxml')
quotes = soup.find_all('url', class_='*class_name*')
href = soup.find_all('a', class_ = '*class_name*')
print(quotes, href)

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

S

Sergey Karbivnichy, 2021-08-13
@Shape_e

import requests
import bs4
import lxml

url = 'https://qna.habr.com'
r = requests.get(url=url)
soup = bs4.BeautifulSoup(r.text, 'lxml')
# quotes = soup.find_all('url', class_='*class_name*')
href = soup.find_all('a', class_ = 'question__title-link')
# print(quotes, href)

for x in href:
  link = x.get('href') # Получаем адрес ссылки
  text = x.text.strip() # Получаем текст ссылки и убираем лишние пробелы и переносы строк
  print(text+' - '+link)

Conclusion:

Как запустить ffmpeg на GPU golang? - https://qna.habr.com/q/1033160
Стенд для изучения DevOps на базе Linux-серверов. С чего начать изучение? - https://qna.habr.com/q/1033364
...
Предварительная загрузка изображений wordpress? - https://qna.habr.com/q/1033300
Не могу зарегистрировать аккаунт стим через свой домен. Что делать? - https://qna.habr.com/q/1033248