Answer the question
In order to leave comments, you need to log in
How beautiful to count the words on the site?
I set out to count the number of certain words on the site.
I threw the code
import requests
from bs4 import BeautifulSoup
import re
word = 'Pitton'
url = 'https://en.wikipedia.org/wiki/Joseph_Pitton_de_Tournefort'
count = 0
r = requests.get(url)
soup = BeautifulSoup(r.content, 'lxml')
# убираю теги html
w = re.sub(r'<[^>]+>', '', str(soup))
# отделяю не буквы от слов для корректного сплита
w = re.sub(r'\W', ' ', w)
for i in w.split():
if i.lower() == word.lower():
count += 1
print(count)
Answer the question
In order to leave comments, you need to log in
There is this, it will convert html to
html2text
text.
And people also do this:
for script in soup(["script", "style"]):
script.extract()
text = soup.get_text()
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question