Answer the question
In order to leave comments, you need to log in
How to process each url in scrapy?
Hello. I work with scrapy.
I can’t figure out how to do the following, there are several satmaps, I scan the first one from the list (this is SitemapSpider ), then when it goes through everything from the first url, I need it to additionally process all the urls (I first collected a title for each url and folded into a list) i.e. i need to take this list and compare with data from database and if there are differences etc. fix. The real juice is that after each sitemap I need to do some manipulations with the info on this url and then run it for the next url.
Now for one sitemap I do it simply), i.e. in the pipelines.py file, in the close_spider function, I do what I need, but how can I do this for different urls?), only one thought comes to mind - to launch a spider from another file in which there will be a cycle responsible for iterating over the list of urls and passing them to spider, but since I'm new to python and scrapy, I don't know how good this idea is and decided to ask more experienced people.
Please tell me to do according to the option as I wrote or is there something in scrapy for such purposes?
Ps There is such an example in the documentation , but it doesn’t suit me because at the end it still glues everything into one, just by processing different links with separate methods, I can’t find anything else
from scrapy.spiders import SitemapSpider
class MySpider(SitemapSpider):
sitemap_urls = ['http://www.example.com/robots.txt']
sitemap_rules = [
('/shop/', 'parse_shop'),
]
other_urls = ['http://www.example.com/about']
def start_requests(self):
requests = list(super(MySpider, self).start_requests())
requests += [scrapy.Request(x, self.parse_other) for x in self.other_urls]
return requests
def parse_shop(self, response):
pass # ... scrape shop here ...
def parse_other(self, response):
pass # ... scrape other here ...
Answer the question
In order to leave comments, you need to log in
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question