S
S
SoulHunter0332021-05-15 17:37:10
PHP
SoulHunter033, 2021-05-15 17:37:10

How to bypass protection against bots on the site?

I welcome everyone! I am new to Python programming, and I decided to start by making a Telegram bot that would send new announcements from the site (I live in Turkey, so the Turkish site is https://www.sahibinden.com ). I started to study data parsing, and ran into a problem.

import requests
from bs4 import BeautifulSoup as Bs
 
url = "https://www.sahibinden.com"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.105 YaBrowser/21.3.3.230 Yowser/2.5 Safari/537.36'}

response = requests.get(url, headers=headers)
html = response.text
with open('test.html','w') as fl:
    fl.write(html)
print(response.status_code,'status_code')

I wrote this code to check how the site responds to the connection, and saved the resulting html page into a document. But when parsing, he gets such a page
<html>
    <head>
        <script type="text/javascript">
         window.location.href = "https://www.sahibinden.com/olagan-disi-kullanim?c";
        </script>
    </head>
    <body>
    </body>
</html>

Does the site have some kind of protection, or what, can anyone help? When connected, it redirects to the page https://www.sahibinden.com/olagan-disi-kullanim?c , where it determines that this is a bot. Can anyone help with this, friends?

Answer the question

In order to leave comments, you need to log in

4 answer(s)
S
Softer, 2018-11-23
@lexa_gorchakov19

if ($array['result'])
  echo "ПРАВДА";
else
  echo "ЛОЖЬ";

A
Alexander, 2018-11-23
@Hkr

echo $array['result'] ? 'ПРАВДА' : 'ЛОЖЬ';

D
Dimonchik, 2021-05-15
@dimonchik2013

you are on the right track, but you are masquerading weakly,
start with Postman, without Python, get the right answers

M
MinTnt, 2021-05-17
@MinTnt

I decided to go check. The protection is triggered because certain headers are missing from headers. Just add all existing ones, step by step discarding everything that does not affect.
Also, in your request, for example, important cookies are not transmitted. And since ads are not available on this site without authorization, it is not clear what exactly you are trying to achieve with an "empty" request.
Up . Well, in general, to bypass protection, just add to headers
'Upgrade-Insecure-Requests': '1'

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question