How to parse sites correctly so as not to catch the captcha?

A

Alexey Vladykin2021-08-09 23:54:52

Parsing

Alexey Vladykin, 2021-08-09 23:54:52

How to parse sites correctly so as not to catch the captcha?
I understand that for "correct parsing" it is necessary that the bot has similar behavior to a person. This can be done by adding headers, proxy to the code.
Are there other ways to reduce the risk of captcha or other blocking systems?

Reply

Answer the question

In order to leave comments, you need to log in

5 answer(s)

F

FanatPHP, 2021-08-10
@FanatPHP

It is necessary to contact the site owners for normal access to data, through the API.
If such access is not given, then do not try to rub change in your pockets, but find yourself a more worthy occupation.

S

shurshur, 2021-08-10
@shurshur

The appearance of captcha in the general case cannot be prevented in any way. You need to understand that captcha is shown not only to bots. Captcha is shown simply to any site visitors when certain conditions occur. It’s just that it’s more difficult for a person to achieve these conditions in the usual scenario of using the site, but even if it arises, it is very easy to unravel it, but for a bot this is a difficulty.
For example, I parsed one site, and after exactly 500 pages it showed a captcha. It is very likely that if I sat and clicked on the site in the browser and clicked 500 pages in half an hour, I would also see the captcha.

D

Dmitry Sviridov, 2021-08-10
@dimuska139

In any case, the captcha will most likely appear periodically. But this does not matter, because there are heaps of services that solve them for a penny. For example, here . Usually they do just that.

D

Dmitry, 2021-08-10
@MaKvc

I reason like this:
1. If the site was originally created and involves an API (or other system for obtaining its content), paid / free, for users, use it!
2. If the site does not assume the above, moreover, it tries to protect its content, then why are you climbing there at all? Freeloader on someone else's hump to go to paradise, even a dime a dozen.