Answer the question
In order to leave comments, you need to log in
Why does the parser with reg. expression?
I parse wikipedia and extract all links to other pages starting with /wiki/. The link must not contain the : sign. Here is my code
from bs4 import BeautifulSoup as bs
from urllib.request import urlopen
import re
site = urlopen('https://en.wikipedia.org/wiki/Kevin_Bacon')
soup = bs(site, features='html.parser')
for i in soup.find('div', {'id': 'bodyContent'}).findAll('a', {'href': re.compile('\/wiki\/(?!:)[\w\/()%]+')}):
if 'href' in i.attrs:
print(i.attrs['href'])
Answer the question
In order to leave comments, you need to log in
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question