A
A
Andrew Lays2015-04-06 22:41:27
linux
Andrew Lays, 2015-04-06 22:41:27

How to wget a site under cloudflare?

I make a wget request to the site kinogo.net , a site under cloudflare, the log is like this:

---request begin---
GET / HTTP/1.1
User-Agent: Wget/1.13.4 (linux-gnu)
Accept: */*
Host: kinogo.net
Connection: Keep-Alive
---request end---

---response begin---
HTTP/1.1 403 Forbidden
Date: Sun, 05 Apr 2015 02:37:09 GMT
Content-Type: text/html; charset=UTF-8
Transfer-Encoding: chunked
Connection: keep-alive
Set-Cookie: __cfduid=d581a7377798cae42e6d530c858ab03921428201429; expires=Mon, 04-Apr-16 02:37:09 GMT; path=/; domain=.kinogo.net; HttpOnly
Cache-Control: max-age=15
Expires: Sun, 05 Apr 2015 02:37:24 GMT
X-Frame-Options: SAMEORIGIN
Server: cloudflare-nginx
CF-RAY: 1d21de956c42158f-FRA
---response end---

It turns out 403 Forbidden, probably due to bad headers.
Trying to catch headers from my browser:
---request begin---
GET / HTTP/1.1
Host: kinogo.net
Connection: keep-alive
Cache-Control: max-age=0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8
---request end---

---response begin---
HTTP/1.1 200 OK
Date: Sun, 05 Apr 2015 02:48:43 GMT
Content-Type: text/html
Transfer-Encoding: chunked
Connection: keep-alive
Set-Cookie: __cfduid=d81a82368470347d5786e6d6f3b31781b1428202123; expires=Mon, 04-Apr-16 02:48:43 GMT; path=/; domain=.kinogo.net; HttpOnly
Set-Cookie: PHPSESSID=pf6ot2v375oceh1krjjtjtebf2; path=/; domain=.kinogo.net; HttpOnly
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Set-Cookie: dle_user_id=deleted; expires=Sat, 05-Apr-2014 02:45:56 GMT; path=/; domain=.kinogo.net; httponly
Set-Cookie: dle_password=deleted; expires=Sat, 05-Apr-2014 02:45:56 GMT; path=/; domain=.kinogo.net; httponly
Set-Cookie: dle_hash=deleted; expires=Sat, 05-Apr-2014 02:45:56 GMT; path=/; domain=.kinogo.net; httponly
Server: cloudflare-nginx
CF-RAY: 1d21ef8690ef01b1-FRA
Content-Encoding: gzip
---response end---

I put wget'u the same headers:
wget -d --header="Cache-Control: max-age=0" --header="Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8" --header="User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36" --header="Accept-Encoding: gzip, deflate, sdch" --header="Accept-Language: en-US,en;q=0.8" http://kinogo.net

Sending:
---request begin---
GET / HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8 Host: kinogo.net
Connection: Keep-Alive
Cache-Control: max-age=0
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8
---request end---

---response begin---
HTTP/1.1 404 Not Found
Date: Sun, 05 Apr 2015 02:56:49 GMT
Content-Type: text/html
Transfer-Encoding: chunked
Connection: keep-alive
Set-Cookie: __cfduid=d073eddf5fef12567851e1e473830867c1428202609; expires=Mon, 04-Apr-16 02:56:49 GMT; path=/; domain=.kinogo.net; HttpOnly
Server: cloudflare-nginx
CF-RAY: 1d21fb675ffa01b1-FRA
Content-Encoding: gzip
---response end---

That is, we send identical headers (the order does not matter, right?), And we get a different response (404 Not Found), but at least not 403 Forbidden.
Selenium webdriver is spinning on the server, I try with it, chrome browser 41, python3 language (tried with firefox too):
---request begin---
GET http://kinogo.net/ HTTP/1.1
Host: kinogo.net
Proxy-Connection: keep-alive
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,/;q=0
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8
---request end---

---response begin---
HTTP/1.1 404 Not Found
Date: Sun, 05 Apr 2015 01:37:56 GMT
Content-Type: text/html
Transfer-Encoding: chunked
Connection: keep-alive
Set-Cookie: Set-Cookie: __cfduid=d0b1d19d69c52f3f9f38eb968211bda211428203483; expires=Mon, 04-Apr-16 01:37:56 GMT; path=/; domain=.kinogo.net; HttpOnly Server: cloudflare-nginx
CF-RAY: 1d2187d5ff401583-FRA
Content-Encoding: gzip
Connection: close
---response end---

The differences are visible only in three lines:
GET http://kinogo.net/ HTTP/1.1
Proxy-Connection: keep-alive
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36

The latter can be ignored, because this is the most common user agent, which is different for everyone. And the answer is still the same 404. I don’t even know, put a proxy on the browser that is being opened and inject my headers there, instead of Proxy-Connection Connection, but will it play any role? If anyone has any guesses - tell me, please.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
H
human_child, 2015-04-06
@human_child

Your "Host" header is missing. Here such request works on "cheers".

---request begin---
GET / HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36
Accept: */*
Host: kinogo.net
Connection: Keep-Alive

---request end---
HTTP-запрос отправлен. Ожидание ответа... 
---response begin---
HTTP/1.1 200 OK

As far as I know Claudflare they check User-Agent and IP reputation. If something is wrong, a 403 is returned and the user is prompted for a captcha.
A request without a "Host" header will return a 404.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question