Answer the question
In order to leave comments, you need to log in
Page parsing task, how to arrange it in a loop / recursion with try / except?
Please help a newbie figure it out
. Given the task:
Given a page (in our case, Wikipedia), you need to parse it and extract all the links, then follow the collected links and extract all the links already from them. It was advised to use recursion, recursion depth 3. As a result, you need to select '.png' from all collected links and write it to a file.
I was only able to collect and sort everything from 1 page, it doesn’t work either with recursion or with a cycle. I keep getting either a ConnectionError or a MemoryError. I understand that you need to introduce try / except, but I'm already completely confused.
Thank you in advance!
from bs4 import BeautifulSoup, SoupStrainer
import requests
class Links:
def get_urls(self, level: int) -> []:
urls = []
try:
links_1 = []
start_link = "https://ru.wikipedia.org/"
links_1.append(start_link)
for i in links_1:
response = requests.get(i)
soup = BeautifulSoup(response.content, "html.parser", parse_only=SoupStrainer(['a', 'img']))
full_list = [link['href'] for link in soup if link.get('href')] + [img['src'] for img in soup if img.get('src')]
full_list = list(set(full_list))
for url in full_list:
if not url.startswith('https:/'):
if url.startswith('/'):
if url.find('.org') == -1:
url = start_link + url[1:]
full_list.append(url)
elif url.find('.org'):
url = 'https:' + url
full_list.append(url)
elif url.startswith('//'):
url = start_link + url[2:]
full_list.append(url)
else:
pass
elif url.startswith('https:/'):
full_list.append(url)
urls.append(full_list)
self.get_urls(level - 1)
links_1 = full_list
links_1 = list(set(links_1))
return links_1
except MemoryError as e:
print(e)
return urls
links = Links()
list_links = links.get_urls(level=3)
#with open('text.txt', 'w') as f:
# for x in list_links:
# if x.endswith('.png'):
# f.write('%s\n' % x)
Answer the question
In order to leave comments, you need to log in
Advised to use recursion, recursion depth 3
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question