Answer the question
In order to leave comments, you need to log in
How to unobtrusively and effectively scrape websites?
Many sites are banned for a large number of hits.
- Are there any statistics or generally accepted norms for the number of calls in a certain period of time?
- What additional information to collect from sites in order to quickly understand why the data was not available for collection at that moment?
- What additional collect information in order to reduce the risk of a ban from the site in the future?
Thank you.
Answer the question
In order to leave comments, you need to log in
There is the concept of throttling. It is quite applicable to your case =)
Started the project to respond more slowly - reduced the load, began to respond normally - increase the load in small steps. We started five hundred - reduced the load several times at once.
But @L3n1n is right - my hompaga will last 10kl and won't itch, and the blozhek will be bent at 300 rps. So the specific numbers for all sites are different.
- Each project has its own restrictions on the number of requests.
- A strange question, describe in more detail.
- What data you collect in my opinion does not play any role in parsing.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question