P
P
Pol1na2021-05-10 12:04:10
Python
Pol1na, 2021-05-10 12:04:10

Parsing a dynamic site in python?

There is a website: https://www.ifema.es/en/fitur/exhibitors-catalogue
When you scroll the grid, the data is automatically loaded. They need to be parsed.
In F12 -> Network, a POST request is sent from https://api.swapcard.com/graphql
How to interact with such a thing and is it possible to form a request to load all the data in the plate?
help out:

Answer the question

In order to leave comments, you need to log in

2 answer(s)
S
soremix, 2021-05-10
@Pol1na

Regular graphql queries. viewId is always the same apparently
endCursor is returned on every request

import requests

data = [{"operationName":"EventExhibitorList","variables":{"viewId":"RXZlbnRWaWV3XzE1MjUyMA==","search":"","selectedFilters":[{"mustEventFiltersIn":[]}]},"extensions":{"persistedQuery":{"version":1,"sha256Hash":"ee232939a5b943c0d87a4877655179bc2e5c73472ff99814119deddb34e0a3b6"}}}]

response = requests.post('https://api.swapcard.com/graphql', json=data).json()
# парсим нужные данные тут

end_cursor = response[0]['data']['view']['exhibitors']['pageInfo'].get('endCursor')

while end_cursor:
    data = [{"operationName":"EventExhibitorList","variables":{"viewId":"RXZlbnRWaWV3XzE1MjUyMA==","search":"","selectedFilters":[{"mustEventFiltersIn":[]}],"endCursor":end_cursor},"extensions":{"persistedQuery":{"version":1,"sha256Hash":"ee232939a5b943c0d87a4877655179bc2e5c73472ff99814119deddb34e0a3b6"}}}]
    
    response = requests.post('https://api.swapcard.com/graphql', json=data).json()

    # тут парсим нужные данные

    end_cursor = response[0]['data']['view']['exhibitors']['pageInfo'].get('endCursor')

V
Vindicar, 2021-05-10
@Vindicar

Look in the same place specific structure of request (that for fields are transferred, their values). Then form the same request, say through requests and send. You will also need to understand how to parse the response - is it json, or is it a piece of html, or something else?
The details depend on the specific site, so the specific code is already freelancing .
Just keep in mind, sites may not like it when they are actively scraped. Options can be:

  • Checking cookies - just load the main page first in the same requests session so that the cookies are there.
  • Special session key for requests - find out how it is loaded and make this request before you start.
  • Limiting the number of requests from one host - use a proxy, or better, limit the speed of the script.
  • Client validation - set the correct User-Agent, you can even copy the request headers.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question