B
B
Bogdan Romanov2021-08-13 17:51:30
Python
Bogdan Romanov, 2021-08-13 17:51:30

How to make the parser output only text and links without Html markup?

Apologies in advance for the shitty code, I'm just getting started :)

import requests
import bs4
import lxml

url = '*page_link*'
r = requests.get(url=url)
soup = bs4.BeautifulSoup(r.text, 'lxml')
quotes = soup.find_all('url', class_='*class_name*')
href = soup.find_all('a', class_ = '*class_name*')
print(quotes, href)

Answer the question

In order to leave comments, you need to log in

1 answer(s)
S
Sergey Karbivnichy, 2021-08-13
@Shape_e

import requests
import bs4
import lxml

url = 'https://qna.habr.com'
r = requests.get(url=url)
soup = bs4.BeautifulSoup(r.text, 'lxml')
# quotes = soup.find_all('url', class_='*class_name*')
href = soup.find_all('a', class_ = 'question__title-link')
# print(quotes, href)

for x in href:
  link = x.get('href') # Получаем адрес ссылки
  text = x.text.strip() # Получаем текст ссылки и убираем лишние пробелы и переносы строк
  print(text+' - '+link)

Conclusion:
Как запустить ffmpeg на GPU golang? - https://qna.habr.com/q/1033160
Стенд для изучения DevOps на базе Linux-серверов. С чего начать изучение? - https://qna.habr.com/q/1033364
...
Предварительная загрузка изображений wordpress? - https://qna.habr.com/q/1033300
Не могу зарегистрировать аккаунт стим через свой домен. Что делать? - https://qna.habr.com/q/1033248

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question