Answer the question
In order to leave comments, you need to log in
How to scrape a site?
It is necessary to parse tasks from the site znanija.com
Each task has a link like https://znanija.com/task/{task_number }. I make get requests with each time increasing the task number, i.e.:
for (i in range(2, max_i)): #начинаю с 2 т.к. первое задание на сайте имеет такой номер
r = requests.get(f"https://znanija.com/task/{i}", headers={"user-agent":"..."})
#...
Answer the question
In order to leave comments, you need to log in
Most free proxies have a huge number of problems, especially in the area of anonymization: most users try to hide their identity from their ISP, and not from the target server.
In addition, the outgoing request may fall off by timeout, but the server does not care: it received the request and instantly issued a response - the problem is with the timeout you set.
Personally, when faced with similar tasks, I set horse delays (up to a minute) and, having received a temporary ban, fell silent for three to four hours (requires some patience, but is the simplest solution to the problem).
With a high degree of probability, the presence of random delays and polling the list out of order slows down the detection of a bot-scarper. but this hypothesis should be tested separately.
You can also try to pretend to be a goolian scanning bot (but this is in the format of the idea).
And free anonymizing https proxy servers that work and do not redirect you to a page with an invitation to buy a subscription are generally akin to unicorns in rarity.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question