Y
Y
Yaroslav2020-10-02 14:30:52
Crawling
Yaroslav, 2020-10-02 14:30:52

How to open www.dhl.ru with curl?

It is very easy to open www.dhl.ru (or https://www.dhl.ru ) in a browser. (Then he will throw a redirect there to another site, but this no longer matters).

But if you do it curl http://www.dhl.ru/, then nothing will happen (it just "hangs" until you press Ctrl-C):

[email protected]:/tmp $ curl https://www.dhl.ru/
^C


The same with www.dhl.com

Probably, this is protection against bots, scraping - everything is ok, I understand. But how does it work? I tried replacing User-Agent and other headers, tried --http2, in general, I think I tried almost completely to "introduce myself" as a real human browser and nothing helps.

There is no practical value in the task, I accidentally stumbled upon it, but I really want to understand how this is done and how it can be bypassed (curl, wget, python requests, etc.)?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
D
Dasha Tsiklauri, 2020-10-02
@xenon

curl 'http://dhl.com/' \
  -H 'Connection: keep-alive' \
  -H 'Upgrade-Insecure-Requests: 1' \
  -H 'User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36' \
  -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9' \
  -H 'Accept-Language: ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7' \
  --compressed \
  --insecure -I

TP/1.0 301 Moved Permanently
Location: http://www.dhl.com/
Server: BigIP
Connection: Keep-Alive
Content-Length: 0

K
Kirill, 2020-10-02
@init0

curl -s -I \
-H "User-Agent: Mozilla/5.0 (Windows NT 10.0;) Firefox/80" \
-H "Accept: */*" \
-H "Accept-Encoding: *" \
https://www.dhl.ru

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question