T
T
tem12qaz2021-05-28 22:35:39
Python
tem12qaz, 2021-05-28 22:35:39

How to bypass selenium blocking by website?

Previously, it was possible to parse a site using requests.
Now it started to produce empty html with one js script.
Tried through Selenium - the same picture.

If you open it in normal chrome, it will redirect to the desired url.
If you open it in Selenium Chrome, nothing will happen.

On stackoverflow I read the advice that the site detects selenium.
The workaround is to open chromedriver.exe in a hex editor
and change all "cdc_" to a different string.
Tried - did not help. Also tried using proxy and fakeuseragent along with Selenium and modified chromedriver.

Is there a way to bypass this block?

The html itself does not fit.
Thanks for the help

UPD:
I use the following settings:

options.add_argument("--disable-blink-features")
options.add_argument("--disable-blink-features=AutomationControlled")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument("start-maximized")

driver = webdriver.Chrome(
    executable_path=r"C:\Users\User\Desktop\project_parse_v3\chromedriver.exe",
        options=options
)

driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
driver.execute_cdp_cmd('Network.setUserAgentOverride', {
    "userAgent": user_agent.random}
)


SOLUTION:
Maxim's answer helped
https://pypi.org/project/selenium-stealth/
Used this solution and the chrome settings above.
I also used the chromedriver changed in the Hex editor, I don’t know how it will work with the original one.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
M
maksam07, 2021-05-29
@tem12qaz

https://pypi.org/project/selenium-stealth/
https://pypi.org/project/undetected-chromedriver/
One of these will probably help.

U
Uno, 2021-06-07
@Noizefan

For the future - instead of panic, we start the sniffer -> we make the same requests, successful and unsuccessful -> we completely study both packages (and all the data that goes to the useragent host and the entire fingerprint, etc.) and use the logic method to determine what the host does not like. Delov will remain in the lion's share of cases to fix the simplest trifle in one and a half lines instead of adding extra libs to the already cumbersome selenium. Or even a "downgrade" to requests will happen. It is extremely unlikely that the site cut you because of the vendor's webgl, right? considering that earlier "anti-fraud" did not exist at all. And then it will fall off again, and the author, either you see, did not provide for everything in the update x)

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question