Answer the question
In order to leave comments, you need to log in
How can I parse reviews from Yandex.maps?
Good day to all dear experts.
It is not possible to solve the problem when writing a parser, perhaps you will be able to solve this issue.
The script parses the reviews on the link ( https://yandex.ru/maps/org/epilium_clinic/19126582... ) for 50 items per request, it is not possible to solve the problem with the transition to other pages.
There is a link to the reviews in the answer, but you can only get the first 50 pieces from it.
https://yandex.ru/maps/api/business/fetchReviews?a...
we get the next portion of &page=2 from the ajax request when we reach the end of the page.
The parameters we need with tokens, etc., are stored in the js script, with the exception of one that I can't find s=1010565311.
Please tell me how you can get the trail parameter &s=???, perhaps there is some other approach to get all the feedback.
Thanks in advance!
import json
import requests
from bs4 import BeautifulSoup
headers = {"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:90.0) Gecko/20100101 Firefox/90.0"}
session = requests.Session() # Создаем сессию
session.headers = headers # Передать заголовок в сессию
base_url = "https://yandex.ru/maps/org/epilium_clinic/191265823168/reviews"
def get_contents(response):
"""Собрать данные с страницы"""
soup = BeautifulSoup(response.text, "lxml")
content = soup.find("script", {"class": "config-view"}).contents
return json.loads(content[0])
def get_response(url):
"""- выполняем запрос"""
response = session.get(url=url)
if response.status_code != 200:
print("Произошла ошибка запроса код не 200")
return response
def get_params(data):
"""- собрать данные запроса"""
csrf_token = data.get("csrfToken")
company_id = data.get("query").get("orgpage").get("id")
session_id = data.get("counters").get("analytics").get("sessionId")
req_id = data.get("orgpagePreloadedResults").get("requestId")
s = ????
print(csrf_token)
print(company_id)
print(session_id)
print(req_id)
print(data)
def main():
response = get_response(base_url)
content = get_contents(response)
get_params(content)
if __name__ == '__main__':
main()
Answer the question
In order to leave comments, you need to log in
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question