S
S
Sergey Bard2017-09-19 16:43:42
Python
Sergey Bard, 2017-09-19 16:43:42

How to process each url in scrapy?

Hello. I work with scrapy.
I can’t figure out how to do the following, there are several satmaps, I scan the first one from the list (this is SitemapSpider ), then when it goes through everything from the first url, I need it to additionally process all the urls (I first collected a title for each url and folded into a list) i.e. i need to take this list and compare with data from database and if there are differences etc. fix. The real juice is that after each sitemap I need to do some manipulations with the info on this url and then run it for the next url.
Now for one sitemap I do it simply), i.e. in the pipelines.py file, in the close_spider function, I do what I need, but how can I do this for different urls?), only one thought comes to mind - to launch a spider from another file in which there will be a cycle responsible for iterating over the list of urls and passing them to spider, but since I'm new to python and scrapy, I don't know how good this idea is and decided to ask more experienced people.
Please tell me to do according to the option as I wrote or is there something in scrapy for such purposes?
Ps There is such an example in the documentation , but it doesn’t suit me because at the end it still glues everything into one, just by processing different links with separate methods, I can’t find anything else

from scrapy.spiders import SitemapSpider
class MySpider(SitemapSpider):
    sitemap_urls = ['http://www.example.com/robots.txt']
    sitemap_rules = [
        ('/shop/', 'parse_shop'),
    ]

    other_urls = ['http://www.example.com/about']

    def start_requests(self):
        requests = list(super(MySpider, self).start_requests())
        requests += [scrapy.Request(x, self.parse_other) for x in self.other_urls]
        return requests

    def parse_shop(self, response):
        pass # ... scrape shop here ...

    def parse_other(self, response):
        pass # ... scrape other here ...

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question