Scrapy. How to optimize the code?

G

gadzhi152016-08-08 22:02:08

Python

gadzhi15, 2016-08-08 22:02:08

I parse the page using Scrapy https://www.reformagkh.ru/myhouse/profile/view/7913930/
I wrote the following code:

def parse_item(self, response):

        hxs = HtmlXPathSelector(response)
        l = ReformaLoader(ReformaItem(), hxs)
        l.add_xpath('house', '/html/body/div[1]/div[2]/h1/span[2]/span[1]/text()')
        l.add_xpath('organization', '/html/body/div[1]/div[2]/section/div[1]/table[1]/tbody/tr/td[2]/a/text()')
        l.add_xpath('year',
                    '/html/body/div[1]/div[2]/div[7]/div/div/div[1]/div/div/table/tbody/tr[4]/td[2]/span/text()')
         return l.load_item()

Then I realized that the data is presented in the form of a table, and you can not write the full XPATh path, but simply go through the table with the for loop and extract the necessary fields

titles = hxs.xpath("//table[@class='orders overhaul-services-table']//tr")
        for titles in titles:

            l.add_xpath(????)

But I don’t understand how to access the elements of the table that need to be retrieved in the loop itself. Or have I chosen the wrong way to solve the problem? Tell me where to dig?
PS In the elevators tab, the table with the data is different, depending on the house. Therefore, my first decision is not good either.

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

I

Ilya, 2016-08-08
@glebovgin

I have never worked with Scrapy, but I think you need to look towards relative xpath queries in the style:

titles = hxs.xpath("//table[@class='orders overhaul-services-table']//tr")
        for title in titles:
                item['year'] = title.xpath('./td[2]/span/text()').extract()
                item['organization'] = title.xpath('./td[2]/a/text()').extract()

where item is an array with your data.
Relative paths must start with a dot.