B
B
Be3yxa2021-11-03 11:55:42
Python
Be3yxa, 2021-11-03 11:55:42

Latency for the site parser?

Good afternoon, there was such a problem, the site - https://www.coinglass.com/pro/cme/cftc did not load data before (the values ​​​​were immediately on the site), but recently it started and if you refresh the page, you will notice that the first second, all values ​​on the page are 0. And the BS4 library parses these zeros. Is there any way to make a delay? So that the page is loaded first, and then parsed.

New to programming, tried to make a timer before requests.get, didn't help.

html = requests.get(URL, headers=HEADERS)
            time.sleep(3)
            soup = BeautifulSoup(html.text, 'lxml')
            long_inst = soup.find_all('table', class_='code133741')[+1].find_all('td')[+25].text
            long_inst_changes = soup.find_all('table', class_='code133741')[+1].find_all('td')[+42].text
            short_inst = soup.find_all('table', class_='code133741')[+1].find_all('td')[+26].text
            short_inst_changes = soup.find_all('table', class_='code133741')[+1].find_all('td')[+43].text
            long_funds = soup.find_all('table', class_='code133741')[+1].find_all('td')[+28].text
            long_funds_changes = soup.find_all('table', class_='code133741')[+1].find_all('td')[+45].text
            short_funds = soup.find_all('table', class_='code133741')[+1].find_all('td')[+29].text
            short_funds_changes = soup.find_all('table', class_='code133741')[+1].find_all('td')[+46].text
            date = soup.find('div', class_='bybt-box').find('div').text[6:]

Answer the question

In order to leave comments, you need to log in

2 answer(s)
S
soremix, 2021-11-03
@Be3yxa

It's not about the delay, the data is loaded dynamically. Regarding the parsing of dynamic sites:
https://qna.habr.com/q/1038438#answer_2008702
I don’t know what you are collecting, but most likely all the necessary data is here
https://fapi.coinglass.com/api/cme/cot/ report

A
Alexander, 2021-11-03
@shabelski89

there is a possibility that this JS gives a delay, here I propose an elegant solution

from bs4 import BeautifulSoup
from selenium import webdriver

url = "http://legendas.tv/busca/walking%20dead%20s03e02"
browser = webdriver.PhantomJS()
browser.get(url)
html = browser.page_source
soup = BeautifulSoup(html, 'lxml')
a = soup.find('section', 'wrapper')

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question