Answer the question
In order to leave comments, you need to log in
How to properly set up a loop for the parser?
I'm parsing a project with a large nesting to the product card and faced the problem of tree-like navigation through the pages.
an indefinite number of pages can lead to the product card.
- link to page(subcategories) 1 is in class papaSupermarket-subcategories-grid-item,
- link to page(subcategories/subcategories) 2 is in class papaSupermarket-subcategories-grid-item,
- link to page(subcategory/subcategories/subcategory ) 3 is in the papaSupermarket-subcategories-grid-item class,
etc.
For example, on the third page, you can already pull out a link to the product card, I did it, everything works correctly.
How should we rewrite this page traversal function, with indefinite nesting?
If the page has a container with the papaSupermarket-subcategories-grid-item class, take a link to the next page from there, then go to it, check if there is a container with the papaSupermarket-subcategories-grid-item class, and if so, go further to the page, if not, then already parsing.
There is a code:
def parse_grid_items(self, response):
urls_grid_items = response.css('li.papaSupermarket-subcategories-grid-item > a::attr(href)').extract()
while urls_grid_items:
for url_items in urls_grid_items:
url_items = response.urljoin(url_items)
# передаем ссылку в функцию def parse
yield scrapy.Request(url=url_items, callback=self.parse)
Answer the question
In order to leave comments, you need to log in
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question