Answer the question
In order to leave comments, you need to log in
How to parse news from news.yandex and mail.ru in php
How to parse news from news.yandex.ru, according to your request, for example -
http://news.yandex.ru/yandsearch?grhow=clutop&text=какой то запрос&rpt=nnews2&p=0
? Answer the question
In order to leave comments, you need to log in
Just recently I wrote a spider for Yandex news on scrapy and I use it quite successfully.
Alas, my volumes are small, so I settled on manual input:
body = html.fromstring(response.body)
# extract params
captcha = body.xpath('//*[@class="b-captcha__image"]/@src')[0]
key = body.xpath('//input[contains(@name, "key")]/@value')[0]
returl = body.xpath('//input[contains(@name, "retpath")]/@value')[0]
self.retpath = returl
# download captcha
try:
os.remove(CAPTCHA_FILE)
except:
pass
urllib.urlretrieve(captcha, CAPTCHA_FILE)
# show captcha
img = Image.open(CAPTCHA_FILE)
img.show()
# get captcha value
captcha_value = raw_input('Put captcha in manually> ')
how frequent requests are needed? cache data on your side, update every 5 minutes, and you will be happy.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question