R
R
ryndenkov2022-03-21 00:05:48
Python
ryndenkov, 2022-03-21 00:05:48

How to write parsed data to json?

Good night, 2 questions.
How to write parsed data to json?
How to go to the next page for parsing if there is no ?page=*, and the link is like this https://pentaschool.ru/trainer/p/*

import requests
from bs4 import BeautifulSoup
import json

JSON = 'trainers.json'
HOST = 'https://pentaschool.ru/'
URL = 'https://pentaschool.ru/trainer'
HEADERS = {
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.74 Safari/537.36'
}

def get_html(url, params=''):
    r = requests.get(url, headers=HEADERS, params=params)
    return r

def get_content(html):
    soup = BeautifulSoup(html, 'html.parser')
    items = soup.find_all('div', class_='trainers-card-col')
    trainers = []
    
    for item in items:
        trainers.append(
            {
                'trainers-card_name':item.find('p', class_='trainers-card_name').get_text(strip=True),
                'trainers-card_title-course-list-ov':item.find('div', class_='trainers-card_title-course-list-ov').get_text(strip=True)
            }
        )
    return trainers

def parser():
    PAGENATION = input('Укажите количество страниц для парсинга: ')
    PAGENATION = int(PAGENATION.strip())
    html = get_html(URL)
    if html.status_code == 200:
        trainers = []
        for page in range(1, PAGENATION):
            print(f'Парсим страницу: {page}')
            html = get_html(URL, params={'page': page})
            trainers.extend(get_content(html.text))
        print(trainers)
    else:
        print('Error')
        
parser()

Answer the question

In order to leave comments, you need to log in

1 answer(s)
S
ScriptKiddo, 2022-03-21
@ryndenkov

How to write parsed data to json?

import json

result = [1, 2, 3, 'Тестовый текст']

with open('result.json', 'w', encoding='utf8') as f:
    json.dump(result, f, ensure_ascii=False)

How to go to the next page for parsing:
Add the page number to the link. It is possible through the addition of strings, it is possible through f-strings
https://pentaschool.ru/trainer/p/4
page = 1
test = f'test/page/{page}'

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question