Parser in Python - how to load new posts via AJAX?

A

AlessandrIT2019-04-13 12:36:57

Python

AlessandrIT, 2019-04-13 12:36:57

Good afternoon. I work with the site , I need to load new posts with the "Load more" button at the end of the page. XHR browser console shows that you need to kick a POST request to the address .

My implementation is

the code

r = requests.post('https://www.biz-cen.ru/load/', data={"search_params":{"metro":{"lines":{}}},
  "was_ra":",",
  "limit":200,
  "to_ra":0,
  "bc_in_fav":[],
  "office_in_fav":[],
  "bcs_in_view_start":[],
  "tolim":21,
  "num_in":3,
  "typeof_search":4,
  "show_fav":0,
  "bcs_in_view":["272",
"254",
"1737",
"2207",
"87"],
  "was_bc_loaded": 0})

Большая часть чисел выпилена в угоду компактности.

However, the parser itself does not see new posts - only those that were loaded initially.
Tell me where I made a mistake

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

A

AWEme, 2019-04-13
@AlessandrIT

I have new posts for each request come only like this:

The code

import requests
from bs4 import BeautifulSoup
import json

load_url = 'https://www.biz-cen.ru/load/'

search_params = {"metro":{"lines":{}},"was_ra":",","limit":20,"to_ra":0,"tolim":20,"bc_in_fav":[],"office_in_fav":[],"bcs_in_view_start":[],"num_in":3,"typeof_search":5,"show_fav":0}
data = {'search_params':json.dumps(search_params), 'was_bc_loaded':0}

session = requests.Session()
seen = set()

def parse(response):
    soup = BeautifulSoup(response.text, 'lxml')
    table = soup.find('ul', id='bObjDataList')
    if table:
        lis = table.find_all('li')
    else:
        lis = soup.find_all('li')
    return [i.find('a').get('href') for i in lis]

while len(seen) < 200:
    response = session.post(load_url, data=data)
    for link in parse(response):
        seen.add(link)
    search_params['limit'] += 20
    data['search_params'] = json.dumps(search_params)
    print(len(seen))

The parse function pulls out a direct link to each of the posts, these links are unique there.

A

Andrey_Dolg, 2019-04-13
@Andrey_Dolg

I honestly think you are mistaken in the logic of work, although if I am mistaken, I will be glad to know.
Firstly, if you do not use browser emulation, then the parsers make a copy of the page ( 1 time ).
Any new data, either you take it directly from the same POST request, or request a page with new parameters, if possible, or use selenium and UI instead of requests.