C
C
CreativeStory2018-11-12 18:27:01
Python
CreativeStory, 2018-11-12 18:27:01

How to properly set up a loop for the parser?

I'm parsing a project with a large nesting to the product card and faced the problem of tree-like navigation through the pages.
an indefinite number of pages can lead to the product card.
- link to page(subcategories) 1 is in class papaSupermarket-subcategories-grid-item,
- link to page(subcategories/subcategories) 2 is in class papaSupermarket-subcategories-grid-item,
- link to page(subcategory/subcategories/subcategory ) 3 is in the papaSupermarket-subcategories-grid-item class,
etc.
For example, on the third page, you can already pull out a link to the product card, I did it, everything works correctly.
How should we rewrite this page traversal function, with indefinite nesting?
If the page has a container with the papaSupermarket-subcategories-grid-item class, take a link to the next page from there, then go to it, check if there is a container with the papaSupermarket-subcategories-grid-item class, and if so, go further to the page, if not, then already parsing.
There is a code:

def parse_grid_items(self, response):
        urls_grid_items = response.css('li.papaSupermarket-subcategories-grid-item > a::attr(href)').extract()
        while urls_grid_items:
            for url_items in urls_grid_items:
                url_items = response.urljoin(url_items)
                # передаем ссылку в функцию def parse
                yield scrapy.Request(url=url_items, callback=self.parse)

Answer the question

In order to leave comments, you need to log in

1 answer(s)
D
Dimonchik, 2018-11-12
@dimonchik2013

indefinite nesting rarely happens, you just have little experience to see the structure, think for yourself - how did those who coded this site do?, "indefinite nesting" is unmanageable

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question