J
J
JRazor2014-04-06 12:32:23
Python
JRazor, 2014-04-06 12:32:23

Scrapy: Response, Request - how to get value?

Hello.
There is a site which sends requests to itself on the server. Actually, he just wants to send these requests himself and take the answers.
From the site, the request looks like this:
ad72795d98ef4664b030a6fd1bda17c1.jpg
I want to get the data that is in Chrome in Response and Preview:
945a9b1f284d432e805742267700642b.jpg
Tell me how?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
E
ehles, 2014-04-08
@ehles

We make a request like this:

import json
from scrapy.contrib.spiders import CrawlSpider

class my_super_spider(CrawlSpider):
    start_urls = ["http://domain.com"]
    url = "http://domain.com/?postcode=123&sku=blablabla"
    def parse(self, response):
        # Тут можно распарсть response (ответ на запрос из start_urls) или просто нагенерить
        # новых запросов, или и то и другое.
        yield Request(url, callback=self.parse_my_url)
    def parse_my_url(self, response):
        # Если сайт отдает ответ в виде json то так:
        data_from_json = json.loads(response.body)
        # Если сайт отдает html то так:
        # xpath можно узнать в панели отладки хрома (правой кнопкой мышки на элементе), например:
        xpath_name = '//*[@id="global"]/div/table/tbody/tr/td[%(col)s]/table/tbody/tr/td/a/text()'
        hxs = HtmlXPathSelector(response)
        column = 100500
        data_from_html = hxs.select(xpath_name % {'col': column}).extract()
        
        # Далее "собираем" items и сохраняем в БД или ещё куда там у вас..

Everything should be clear from the comments in the code.
PS how to do it is well described here: doc.scrapy.org/en/latest/topics/spiders.html
PPS You don't need to emulate cookies with scrapy, it does everything by itself (of course, if necessary, you can get access to them).

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question