B
B
Bjornie2017-10-23 16:39:41
Python
Bjornie, 2017-10-23 16:39:41

Optimizing download settings to increase the speed of getting items from Amazon to Scrapy?

Wrote a parser for the first time using Scrapy to analyze Amazon prices. I use MySQL, I work through paid proxies, I have connected captcha solving. and other necessary libraries. In general, everything works fine and I really liked the framework itself. However, there is one point that is not very clear to me, namely the configuration of the following parameters:

CONCURRENT_REQUESTS = 16
DOWNLOAD_DELAY = 0.25
COOKIES_ENABLED = False

And I will set it separately (disabled): # AUTOTHROTTLE_ENABLED = True
I have already tried various numbers of parallel requests, both with and without delay (DOWNLOAD_DELAY = 0). Also separately tried AUTOTHROTTLE_ENABLED.
Because I have a large number of pages, the speed of parsing is critical for me, but this is a "respectful" attitude towards Amazon in order not to be banned, in connection with which I would like to know who already has experience: What settings are preferable to set in order not to risk?
Should I use AUTOTHROTTLE_ENABLED (even though it slows things down noticeably)?
I will add that I make each new request through a proxy with an auto-change of User-agent, and after solving the captcha I keep the connection to the same proxy.

PS Forgot to add:
is it possible to run the same spider from another console, thus creating other running instances of the program in parallel?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
D
Dimonchik, 2017-10-23
@dimonchik2013

if proxy - how will he ban?
autothrottle without a proxy is usually needed if there is targeted protection - it will not help, but just right

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question