Answer the question
In order to leave comments, you need to log in
How to change the User-Agent in Scrapy along with changing the IP (proxy) and how are errors handled?
I am writing an Amazon parser using Scrapy, I want to change the user agent along with the proxy change so that everything looks as natural as possible. Haven't figured out how to do it yet. Please tell me. Should I leave the USER_AGENT constant in settings.py commented out?
PS And another question: how are requests usually processed for which they did not receive a response (ie, if the site is down, or we were banned, or some other reason). I am reading ASIN list from .txt i.e. I have a known list of addresses. What if the parser stops somewhere in the middle: how to write and process this?
Answer the question
In order to leave comments, you need to log in
Amazon has a lot of specifics in parsing.
I usually don't change UA, just use IP rotation. However, if you want, you can use ready-made middleware, for example this one https://github.com/alecxe/scrapy-fake-useragent.
If you want to manually change the UA when creating a Request, just write the UA in the header.
Regarding bad answers, there are two options.
1) Amazon will give you a captcha, you can recognize it with the help of Indians, for example, anti-captcha.com, or you can write or use a ready-made OCR. (I have about a 30% chance of recognition)
2) Amazon will give you 503 when your IP is finally tired of it.
Regarding saving progress with asin. I took the asin list from MySQL and wrote the data back there, changing the status if everything is ok.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question