A
A
Andrew2011-02-23 00:53:03
Python
Andrew, 2011-02-23 00:53:03

Scrapy - Python

When parsing one site on Bitrix, after a while, a “400” error is returned from the server. Has anyone dealt with it and can tell you how to more cleverly disguise yourself as a user?
I set the interval between requests to 2 seconds, and the result is always the same - 200 elements and 400 error

Answer the question

In order to leave comments, you need to log in

4 answer(s)
A
Anatoly, 2011-02-23
@taliban

It seems to me that this is not because the server is firing that you are walking, you can make a random interval from 2 to 5 seconds to check, maybe the server is just buggy =)

B
bekbulatov, 2011-02-23
@bekbulatov

In addition to DOWNLOAD_DELAY and USER_AGENT, which I believe you have already changed, try lowering these CONCURRENT_ITEMS, CONCURRENT_REQUESTS_PER_SPIDER, CONCURRENT_SPIDERS settings.

K
kmike, 2011-02-23
@kmike

Maybe there is a limit not on the number of requests per minute or second, but on the number of requests per hour, for example.

A
Andrew, 2011-02-25
@xmdy

The problem was that Bitrix, as always, was ahead of the rest - it stored the history of visits in cookies. And when about 200 elements accumulated, it refused to work as it should. The result - disabled cookies and a few hours of time to analyze the entire problem)

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question