P
P
ParnishkaSPB2020-06-19 12:16:54
Python
ParnishkaSPB, 2020-06-19 12:16:54

Why can't the parser parse the page I'm asking it?

import requests
from bs4 import BeautifulSoup
import csv

# URL = 'https://101hotels.com/recreation/russia/sankt-peterburg/points#page=2'(Пробелы стоят нарочно)
FILE = 'Par.csv'



def get_html(url):
    r = requests.get(url)
    return r

def get_content(html):
    soup = BeautifulSoup(html, 'html.parser')
    items = soup.find_all('li', class_='item')

    objects = []
    for item in items:
        try:
            objects.append({
                'title': item.find('div', class_='item-name').text,
                'address': item.find('span', class_='item-address').text,
                'p': item.find('div', class_='item-description').text.replace('\xa0',''),
            })
        except:
            pass
    return objects


def save_file(items, path):
    with open(path, 'w', newline='') as file:
        writer = csv.writer(file, delimiter=';')
        writer.writerow(['Объект', 'Адрес', 'Описание'])
        for item in items:
            writer.writerow([item['title'], item['address'], item['p']])


def parse():
    URL = input('Введите URL: ')
    URL = URL.strip()
    html = get_html(URL)
    try:
        objects = []
        objects.extend(get_content(html.text))
        save_file(objects, FILE)
    except:
        print('Error')




parse()


I wanted to make a parser of interesting places in St. Petersburg, everything seems to be ok, but the problem is that there are a lot of pages, and when parsing, it gave the result of only 1 page out of 14, although it continued to parse. I thought that I screwed up somewhere, and removed the page counter and entered the URL from my hands. But the dilemma is that even if I just give it the URL 2nd and so on. pages, nothing will change, the result is 1 page. Could you help me solve the problem with parsing the remaining pages?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
M
Matvey Istomin, 2020-06-19
@Minute

The site loads information with JS, which sends a request to the server. You can see what queries are running. (Firefox - Ctrl + Shift + E)
And you can notice that when you go to another page, some kind of request is executed.

GET - https://101hotels.com/api/facility/search
Query string:
r=0.0000530041150925655330.07492892309472692
params={"city_ids":[13],"category_url":"points"}
page=2

Fulfilled request without parameter r, and everything works.
import requests
import json

def gen_params(page, city_ids=[13]):
    return {
        'page': page,
        'params': json.dumps({
            'city_ids': city_ids,
            'category_url': 'points'
        })
    }

data = []
for page in range(1, 6):
    r = requests.get("https://101hotels.com/api/facility/search", params=gen_params(page))
    data.extend(r.json()['response'])

print(json.dumps(data[-1], indent=4, ensure_ascii=False, sort_keys=True))

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question