G
G
gadzhi152016-08-08 22:02:08
Python
gadzhi15, 2016-08-08 22:02:08

Scrapy. How to optimize the code?

I parse the page using Scrapy https://www.reformagkh.ru/myhouse/profile/view/7913930/
I wrote the following code:

def parse_item(self, response):

        hxs = HtmlXPathSelector(response)
        l = ReformaLoader(ReformaItem(), hxs)
        l.add_xpath('house', '/html/body/div[1]/div[2]/h1/span[2]/span[1]/text()')
        l.add_xpath('organization', '/html/body/div[1]/div[2]/section/div[1]/table[1]/tbody/tr/td[2]/a/text()')
        l.add_xpath('year',
                    '/html/body/div[1]/div[2]/div[7]/div/div/div[1]/div/div/table/tbody/tr[4]/td[2]/span/text()')
         return l.load_item()

Then I realized that the data is presented in the form of a table, and you can not write the full XPATh path, but simply go through the table with the for loop and extract the necessary fields
titles = hxs.xpath("//table[@class='orders overhaul-services-table']//tr")
        for titles in titles:

            l.add_xpath(????)

But I don’t understand how to access the elements of the table that need to be retrieved in the loop itself. Or have I chosen the wrong way to solve the problem? Tell me where to dig?
PS In the elevators tab, the table with the data is different, depending on the house. Therefore, my first decision is not good either.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
I
Ilya, 2016-08-08
@glebovgin

I have never worked with Scrapy, but I think you need to look towards relative xpath queries in the style:

titles = hxs.xpath("//table[@class='orders overhaul-services-table']//tr")
        for title in titles:
                item['year'] = title.xpath('./td[2]/span/text()').extract()
                item['organization'] = title.xpath('./td[2]/a/text()').extract()

where item is an array with your data.
Relative paths must start with a dot.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question