K
K
ks_ks2012-02-17 01:34:07
Yandex
ks_ks, 2012-02-17 01:34:07

Scrapy - I give up!

dumpz.org/160218/ and paste.in.ua/3882/ are the results of my tinkering with scrapy.

There are three questions on this topic:

how to send this request

opener = urllib2.build_opener()
opener.addheaders = [('cookie', cookie)]
opener.open('http://wordstat.yandex.ru/?cmd=words&page=1&text=&geo=&text_geo=&captcha_id=%s&captcha_val=', key)

- Yandex ... where key is a thing we know, and cookie is unknown?

How to make further travels of our spider under the same cookies?

And the third - how to slow it down a little?
Otherwise, he runs very fast and hammers the already tortured Yandex. :)

Answer the question

In order to leave comments, you need to log in

3 answer(s)
B
bekbulatov, 2012-02-17
@bekbulatov

You can slow it down with these settings
CONCURRENT_REQUESTS - max number of simultaneous requests
CONCURRENT_REQUESTS_PER_DOMAIN - max number of simultaneous requests per domain
DOWNLOAD_DELAY - delay between requests
Previously, there was also CONCURRENT_SPIDERS - max number of spiders. Apparently they removed it.
By default, scrapy merges cookies, as indicated by the dont_merge_cookies
parameter The only condition is that you must use the built-in Request

D
demark, 2012-02-17
@demark

A little offtopic, but what are your wordstat request volumes? Maybe it makes sense to use Yandex.Direct api - CreateNewWordstatReport ? (for 1 acc — 1000 requests/day)

L
lorien, 2012-02-17
@lorien

> And the third - how to slow it down a little?
I didn’t work much with scrapie, but I think it’s trivial to call sleep inside the handler function.
Use grab :)

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question