Latency for the site parser?

B

Be3yxa2021-11-03 11:55:42

Python

Be3yxa, 2021-11-03 11:55:42

Good afternoon, there was such a problem, the site - https://www.coinglass.com/pro/cme/cftc did not load data before (the values were immediately on the site), but recently it started and if you refresh the page, you will notice that the first second, all values on the page are 0. And the BS4 library parses these zeros. Is there any way to make a delay? So that the page is loaded first, and then parsed.

New to programming, tried to make a timer before requests.get, didn't help.

html = requests.get(URL, headers=HEADERS)
            time.sleep(3)
            soup = BeautifulSoup(html.text, 'lxml')
            long_inst = soup.find_all('table', class_='code133741')[+1].find_all('td')[+25].text
            long_inst_changes = soup.find_all('table', class_='code133741')[+1].find_all('td')[+42].text
            short_inst = soup.find_all('table', class_='code133741')[+1].find_all('td')[+26].text
            short_inst_changes = soup.find_all('table', class_='code133741')[+1].find_all('td')[+43].text
            long_funds = soup.find_all('table', class_='code133741')[+1].find_all('td')[+28].text
            long_funds_changes = soup.find_all('table', class_='code133741')[+1].find_all('td')[+45].text
            short_funds = soup.find_all('table', class_='code133741')[+1].find_all('td')[+29].text
            short_funds_changes = soup.find_all('table', class_='code133741')[+1].find_all('td')[+46].text
            date = soup.find('div', class_='bybt-box').find('div').text[6:]

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

S

soremix, 2021-11-03
@Be3yxa

It's not about the delay, the data is loaded dynamically. Regarding the parsing of dynamic sites:
https://qna.habr.com/q/1038438#answer_2008702
I don’t know what you are collecting, but most likely all the necessary data is here
https://fapi.coinglass.com/api/cme/cot/ report

A

Alexander, 2021-11-03
@shabelski89

there is a possibility that this JS gives a delay, here I propose an elegant solution

from bs4 import BeautifulSoup
from selenium import webdriver

url = "http://legendas.tv/busca/walking%20dead%20s03e02"
browser = webdriver.PhantomJS()
browser.get(url)
html = browser.page_source
soup = BeautifulSoup(html, 'lxml')
a = soup.find('section', 'wrapper')