J
J
Jumper12020-02-27 21:02:41
Python
Jumper1, 2020-02-27 21:02:41

How to get data from a page using scrapy?

The task is to parse the card, then follow the link in the card and get the necessary data from there, but in such a way that it is displayed together.
Now I can only output separately and, apparently, in a scatter.
I tried to merge into a variable, but it gave an error
It was already perverted in different ways, but without result
What can I try to do? And is it even possible?)

class TutorSpider(scrapy.Spider):
    name = 'tutorial'
    start_urls = [
        'https://hh.ru/search/vacancy?L_is_autosearch=false&area=3&clusters=true&enable_snippets=true&text=Python&page=0',
    ]

    def parse(self, response: HtmlResponse):
        """vacancy_href = response.xpath('//a[@class="bloko-link HH-LinkModifier"]/@href')
        for href in vacancy_href:
            yield response.follow(href, callback=self.parse_vacancy)"""

        # Переходит по страницам
        next_page = response.xpath('//a[@class="bloko-button HH-Pager-Controls-Next HH-Pager-Control"]')
        for page in next_page:
            yield response.follow(page, callback=self.parse)

        # Парсит карточку с вакансией
        for card_vacancy in response.xpath('//div[@class="vacancy-serp-item "]'):
            yield {'title': card_vacancy.xpath('.//a[@class="bloko-link HH-LinkModifier"]/text()').get(),
                'salary': card_vacancy.xpath('.//span[@class="bloko-section-header-3 bloko-section-header-3_lite"]/text()').get(),
                'employer': card_vacancy.xpath('.//a[@class="bloko-link bloko-link_secondary"]/text()').get(),
                }

        # Парсит страницу с тегами и адресом
        for page_with_details in response.xpath('//a[@class="bloko-link HH-LinkModifier"]'):
            yield response.follow(page_with_details, self.parse_vacancy_details)


    def parse_vacancy_details(self, response: HtmlResponse):
        yield {'place': response.xpath('//span[@data-qa="vacancy-view-raw-address"]/text()').get(),
               'tags': response.xpath('//span[@data-qa="bloko-tag__text"]/text()').getall(),
                'url': response.url,
                }

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question