How to parse data loaded by scrolling the page?

A

Alexander Beltipeterov2021-07-17 06:49:06

Python

Alexander Beltipeterov, 2021-07-17 06:49:06

Good day!
I recently started learning Python and dug up a course about 2 years old on parsing. In the first lesson there were examples with the Wordpress.org site, everything was fine there, there were changes on the site, but they concerned more content than structure (layout), but the next lesson talks about parsing tabular data using the coinmarketcap.com site as an example, which has undergone changes in the structure, as I understand it, because in the video this code receives all 100 values, and now only the first 10 and then throws an AttributeError: 'NoneType' object has no attribute 'text' error. I understand why the error occurs, the program "does not see" the necessary data, since they are loaded during the scrolling process to the end of the page, as it seems to me.
How can I change this code to process all 100 positions from the first page of this site?
PS The length of the list collected by the tr tag is 100, which means that lines from 11 to 100 are considered empty.

import requests
from bs4 import BeautifulSoup

def get_html(url):
  r = requests.get(url)
  return r.text

def get_page_data(html):
  soup = BeautifulSoup(html, 'lxml')

  trs = soup.find('table').find('tbody').find_all('tr')
  print(len(trs))

  for tr in trs:
    tds = tr.find_all('td')
    name = tds[2].find('a').find('p').text
    print(name)

def main():
  url = 'https://coinmarketcap.com'
  get_page_data(get_html(url))

if __name__ == '__main__':
  main()

I hope to hear the advice of seasoned programmers and thanks for your attention!

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

N

Nadim Zakirov, 2021-07-17
@zkrvndm

We look at the Sit tab in the console and see that the data is loaded from links:
First page: https://api.coinmarketcap.com/data-api/v3/cryptocu...
Second page: https://api.coinmarketcap.com/ data-api/v3/cryptocu...
Third page: https://api.coinmarketcap.com/data-api/v3/cryptocu...
Parse them as JSON. The logic of formation is intuitive I think.