Answer the question
In order to leave comments, you need to log in
Why do bots use the HTTP 1 protocol?
Hi all.
It turns out that viewing RAW web logs can give a lot of interesting information :)
I noticed by chance that all bots (both search and spam and click bots) stubbornly use the GET / HTTP / 1.1 protocol - although the server and nginx are configured to return in HTTP / 2.0
AND it is very easy to weed out bots from real visitors in this stream of calls to the site.
It was also funny to watch how bots emulate a visit using the whole range of browsers and operating systems and changing all the headers, but scorching on such seemingly trifles :)
Answer the question
In order to leave comments, you need to log in
Most of the libraries used to work with the http protocol still do not know how to do anything other than http/1.1.
The same popular requests in python - the authors do not plan to add http / 2 support in the near future. And the only functional analogue - pycurl is not very convenient to use (although it can http / 2).
Well, the second reason is that http/2 is still not so popular, most non-it sites still serve content only via http/1.1. And heaped-up protections are faster to bypass with selenium than to adapt to the protection of a particular site.
PS About the cutting edge of technology, you are confusing it. Most of these bots are written by unfortunate freelancers who looked at the youtube guide "how to write a site parser in %languageName%"
Why would they use the HTTP/2.0 protocol? Bots work on the principle of "connected, leaked one page, disconnected" and do not download accompanying page resources like scripts and pictures along with it (they can download, but they rarely do this). Therefore, they don’t really need the buns of the 2nd version of the protocol to rewrite bots for their sake.
After writing this question and receiving a response from the community, I began to constantly look at the raw logs of nginx-a - and with 99% confidence we can say that bots (especially if the site is promoted in paid systems such as Yandex Direct and Google Adwords) - use the http/ 1 - here's a side life hack for you - modern website visitors use http / 2 - this is how the server is configured, and user browsers use it accordingly. I wish I could figure out what to do and what content to give to users with http / 1, or rather bots ...
Maybe this will seem useful to someone.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question