Answer the question
In order to leave comments, you need to log in
How to go to the next page in scrapy?
Hello, I am writing a scrapy news parser, I need it to start parsing from the start url, open each news, extract data, then go to the next page and do the same thing. I only parse the first one, but I don’t want to go further
class GuardianSpider(CrawlSpider):
name = 'guardian'
allowed_domains = ['theguardian.com']
start_urls = ['https://www.theguardian.com/world/europe-news']
rules = (
Rule(LinkExtractor(restrict_xpaths=("//div[@class='u-cf index-page']",),
allow=('https://www.theguardian.com/\w+/\d+/\w+/\d+/\w+',)),
callback = 'parser_items'),
Rule(LinkExtractor(restrict_xpaths=("//div[@class='u-cf index-page']",),
allow=('https://www.theguardian.com/\w+/\w+?page=\d+',)),
follow = True),
)
Answer the question
In order to leave comments, you need to log in
In general, I would use `BaseSpider` rather than `CrawlSpider` and manually set the xpaths for next_page and news.
Something like this:
def parse(self, response):
news_css = 'div.fc-item__container > a::attr(href)'
for news_link in response.css(news_css).extract():
req = scrapy.Request(response.follow(url=news_link, callback=self.parser_items)
yield req
next_page_css = 'div.pagination__list > a::attr(href)'
for nextpage_link in response.css(news_css).extract():
req = scrapy.Request(response.follow(url=nextpage_link, callback=self.parse)
yield req
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question