Answer the question
In order to leave comments, you need to log in
Python - referenced before assignment in the process of parsing a site, how to look for an error?
Good day to all. I am writing a parser for a website. On Django, using the requests and BeautifulSoup libraries. Simple but long lasting. Collects information, stores information through models in the database.
The crux of the matter is that in the process of work it is necessary:
- to collect a list of main objects from the first html page, go through the n-th number of pages of this first page, complete the list, collect information, save it to the database
- then go to the html page of each of the objects , collect information, save, if some conditions are met - go to the next html page, "work" there
Access to url's is done using requests. Content parsing is done through BeautifulSoup.
Sometimes the url does not provide information, requests.exceptions.ReadTimeout or ConnectionTimeout exceptions are fired.
I had to construct something like this for each request to the url:
`
import requests
from requests.exceptions import Timeout
from bs4 import BeautifulSoup as bs
MIMIC_HEADERS = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
...
read_fail = True
while read_fail:
try:
sleep(1)
response = session.get(start_url, timeout=10, headers=MIMIC_HEADERS)
html_bs = bs(response.content, 'html.parser')
except Timeout:
read_fail = True
except UnboundLocalError:
read_fail = True
finally:
read_fail = False
...(прочие действия над html_bs)
`
Answer the question
In order to leave comments, you need to log in
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question