A
A
Alexey2020-07-22 20:25:41
Python
Alexey, 2020-07-22 20:25:41

How to fix python code?

There is code from a book on parsing in Python
Should display links from page
A outputs like this
/wiki/IMDb
/wiki/2007_Webby_Awards
/wiki/2017_Webby_Awards
/wiki/Internet_Archive
here is the code

from urllib.request import urlopen
from bs4 import BeautifulSoup
import datetime
import random
import re

random.seed(datetime.datetime.now())


def getLinks(articleUrl):
    html = urlopen("http://en.wikipedia.org" + articleUrl)
    bsObj = BeautifulSoup(html, "html.parser")
    return bsObj.find\
        ("div", {"id": "mw-content-text"}).findAll\
        ("a", href=re.compile("^(/wiki/)((?!:).)*$"))


links = getLinks("/wiki/Kevin_Bacon")


while len(links) > 0:
    newArticle = links[random.randint(0, len(links) - 1)].attrs["href"]
    print(newArticle)
    links = getLinks(newArticle)

what's wrong with him

Answer the question

In order to leave comments, you need to log in

1 answer(s)
S
soremix, 2020-07-22
@Barmalei76

Everything is so, it should display links like that, no one writes the full path in the href attribute, because why.
Add a base link to your hrefs
print(' en.wikipedia.org ' + newArticle)

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question