A
A
Asya2021-04-13 17:25:47
Python
Asya, 2021-04-13 17:25:47

How to bypass captcha in selenium?

6075a948ec401364000215.jpeg

after some time from the start of scaping, a captcha appears and neither ip substitution nor user-agent helps to bypass
in a similar question they suggest manually solving captcha, but this option does not suit me
if there are ways to automatically bypass captcha in python?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
F
Fenix957, 2021-04-13
@asyaevloeva

https://rucaptcha.com/demo/recaptcha-v2
Not free but pretty cheap there is api and detailed instructions
160 rubles
For 1000 regular captchas

S
Sergey Karbivnichy, 2021-04-13
@hottabxp

First of all, you need to understand - what is Selenium ?
First of all, Selenium WebDriver is a software for automatic testing of WEB applications. Yes, no one forbids using it for parsing, and many use it. But this is a secondary use. Depending on the server to which WebDriver connects, it (on the server) may have software installed that fires software (sorry for the tautology) for automation. Masquerading Selenium for large sites is difficult, and sometimes not possible at all.

neither ip substitution nor user-agent helps to bypass
There are several hundred parameters by which a client is identified. Therefore, if you change ip and user-agent, then your browser fingerprint will change by 0.0002%. It's like pouring 100 grams into the gray sea. white paint - from this it will not become white.
How to bypass captcha in selenium?
If you do not use third-party services, then something like this:
1) Hire a mathematician ( mandatory! Or maybe two).
2) Hire programmers.
3) Buy expensive equipment (main focus on video cards)
4) Download a file (I think a couple of TB is enough for a start) with pictures of hydrants, traffic lights, boats, etc.
Profit! The mathematician(s) build an algorithm to train the network, the programmers translate it into Python code. Well, if you know Python, then great, you can save money on programmers.
How not to run into a captcha:
There is a person who owns an official scraping company. Here is a summary of his words:
They scrap everything, from small sites to Wildberries, Ozon, etc. They have a lot of servers (vds), connected services for solving captchas, proxies (well, here I think it’s clear that they are paid). With all this, they rarely get on the captcha. The algorithm is approximately the following: parsers are running on many servers. They get assignments. Each product 1 parser parses from the site every 9-25 seconds. At the same time, instead of pausing, parsers do not stop. During this period, goods from another site are parsed. As a result, parsers work without a pause around the clock, do not load sites, and do not fall into the field of view. That is, they do not hammer a site from one server at 200 requests per second.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question