Scrapy - I give up!

K

ks_ks2012-02-17 01:34:07

Yandex

ks_ks, 2012-02-17 01:34:07

Scrapy - I give up!

dumpz.org/160218/ and paste.in.ua/3882/ are the results of my tinkering with scrapy.

There are three questions on this topic:

how to send this request

opener = urllib2.build_opener() opener.addheaders = [('cookie', cookie)] opener.open('http://wordstat.yandex.ru/?cmd=words&page=1&text=&geo=&text_geo=&captcha_id=%s&captcha_val=', key)

- Yandex ... where key is a thing we know, and cookie is unknown?

How to make further travels of our spider under the same cookies?

And the third - how to slow it down a little?
Otherwise, he runs very fast and hammers the already tortured Yandex. :)

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

B

bekbulatov, 2012-02-17
@bekbulatov

You can slow it down with these settings
CONCURRENT_REQUESTS - max number of simultaneous requests
CONCURRENT_REQUESTS_PER_DOMAIN - max number of simultaneous requests per domain
DOWNLOAD_DELAY - delay between requests
Previously, there was also CONCURRENT_SPIDERS - max number of spiders. Apparently they removed it.
By default, scrapy merges cookies, as indicated by the dont_merge_cookies
parameter The only condition is that you must use the built-in Request

D

demark, 2012-02-17
@demark

A little offtopic, but what are your wordstat request volumes? Maybe it makes sense to use Yandex.Direct api - CreateNewWordstatReport ? (for 1 acc — 1000 requests/day)

L

lorien, 2012-02-17
@lorien

> And the third - how to slow it down a little?
I didn’t work much with scrapie, but I think it’s trivial to call sleep inside the handler function.
Use grab :)