Scrapy: Response, Request - how to get value?

J

JRazor2014-04-06 12:32:23

Python

JRazor, 2014-04-06 12:32:23

Hello.
There is a site which sends requests to itself on the server. Actually, he just wants to send these requests himself and take the answers.
From the site, the request looks like this:

I want to get the data that is in Chrome in Response and Preview:

Tell me how?

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

E

ehles, 2014-04-08
@ehles

We make a request like this:

import json
from scrapy.contrib.spiders import CrawlSpider

class my_super_spider(CrawlSpider):
    start_urls = ["http://domain.com"]
    url = "http://domain.com/?postcode=123&sku=blablabla"
    def parse(self, response):
        # Тут можно распарсть response (ответ на запрос из start_urls) или просто нагенерить
        # новых запросов, или и то и другое.
        yield Request(url, callback=self.parse_my_url)
    def parse_my_url(self, response):
        # Если сайт отдает ответ в виде json то так:
        data_from_json = json.loads(response.body)
        # Если сайт отдает html то так:
        # xpath можно узнать в панели отладки хрома (правой кнопкой мышки на элементе), например:
        xpath_name = '//*[@id="global"]/div/table/tbody/tr/td[%(col)s]/table/tbody/tr/td/a/text()'
        hxs = HtmlXPathSelector(response)
        column = 100500
        data_from_html = hxs.select(xpath_name % {'col': column}).extract()
        
        # Далее "собираем" items и сохраняем в БД или ещё куда там у вас..

Everything should be clear from the comments in the code.
PS how to do it is well described here: doc.scrapy.org/en/latest/topics/spiders.html
PPS You don't need to emulate cookies with scrapy, it does everything by itself (of course, if necessary, you can get access to them).