How can I parse reviews from Yandex.maps?

P

Pavel2021-07-29 09:52:31

Python

Pavel, 2021-07-29 09:52:31

Good day to all dear experts.

It is not possible to solve the problem when writing a parser, perhaps you will be able to solve this issue.
The script parses the reviews on the link ( https://yandex.ru/maps/org/epilium_clinic/19126582... ) for 50 items per request, it is not possible to solve the problem with the transition to other pages.

There is a link to the reviews in the answer, but you can only get the first 50 pieces from it.
https://yandex.ru/maps/api/business/fetchReviews?a...

we get the next portion of &page=2 from the ajax request when we reach the end of the page.

The parameters we need with tokens, etc., are stored in the js script, with the exception of one that I can't find s=1010565311.

Please tell me how you can get the trail parameter &s=???, perhaps there is some other approach to get all the feedback.

Thanks in advance!

import json

import requests
from bs4 import BeautifulSoup


headers = {"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:90.0) Gecko/20100101 Firefox/90.0"}

session = requests.Session()  # Создаем сессию
session.headers = headers  # Передать заголовок в сессию

base_url = "https://yandex.ru/maps/org/epilium_clinic/191265823168/reviews"


def get_contents(response):
    """Собрать данные с страницы"""
    soup = BeautifulSoup(response.text, "lxml")
    content = soup.find("script", {"class": "config-view"}).contents
    return json.loads(content[0])


def get_response(url):
    """- выполняем запрос"""
    response = session.get(url=url)
    if response.status_code != 200:
        print("Произошла ошибка запроса код не 200")
    return response


def get_params(data):
    """- собрать данные запроса"""
    csrf_token = data.get("csrfToken")
    company_id = data.get("query").get("orgpage").get("id")
    session_id = data.get("counters").get("analytics").get("sessionId")
    req_id = data.get("orgpagePreloadedResults").get("requestId")
    s = ????

    print(csrf_token)
    print(company_id)
    print(session_id)
    print(req_id)

    print(data)


def main():
    response = get_response(base_url)
    content = get_contents(response)
    get_params(content)


if __name__ == '__main__':
    main()

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

M

Michael, 2021-07-29
@moonz

For such tasks, I highly recommend Selenium