Answer the question
In order to leave comments, you need to log in
How to bypass site blocking from parsing?
This site needs to be parsed: https://runcsgo.org.
This site is secure and I use fake-useragent to bypass the block.
I kind of go through it, but I get something completely different from what is on the site when I log in through the browser.
Here is my code:
import requests as req
from bs4 import BeautifulSoup as BS
from fake_useragent import UserAgent
UserAgent().chrome
html = req.get("http://csgorun.org",headers={'User-Agent': UserAgent().chrome})
soup = BS(html.text, features="html.parser")
print(html)
Answer the question
In order to leave comments, you need to log in
Who told you that there is a blockage?
1) Some data is loaded by xhr .
2) Also, the data on the site is updated via websocket .
websockets.readthedocs.io
PyPI websockets 8.1
Here's my answer too.
The site does not block me if I use UserAgent. But alas, I did not manage to get the whole page using BS4, so I used the chrome driver but in the background.
Here's the resulting code:
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
import driver
from time import sleep
from bs4 import BeautifulSoup as BS
ua = dict(DesiredCapabilities.CHROME)
options = webdriver.ChromeOptions()
options.add_argument('headless')
browser = webdriver.Chrome(chrome_options=options)
browser.get('https://csgorun.org/')
soup = BS(browser.page_source,"html.parser")
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question