C
C
coderisimo2019-10-21 20:27:42
Python
coderisimo, 2019-10-21 20:27:42

Why was I "revealed", scrapy gives out cracks and how to get the HTML of the page itself (not an easy question)?

I make a request to fucking... , that is, to booking ( https://www.booking.com). In response, strange kryakozyabry come. Like this one

[sV7eV>Td{TZ'7UO_/ ϟU9/PDK4kE
i6lLu̒CspPL FRٺ

the full text (for those who are not afraid to face the unimaginable) is here - https://drive.google.com/open?id=1xeGxThHw919zk3l1...
the answers are different all the time, there are strange constructions like inside And so on. two problems. 1) I can't bring it back to normal. Either booking is trolling me and figured it out, but then why send such pieces, and not just throw out a server error? 2) Exactly the same request from postman responds with a normal 302 redirect, and if you open the link in its header, the site opens without problems. Important: It's not about jS (postman doesn't execute js , but everything works in it and the site loads). not in the headers (in postman I don’t set them on purpose and I don’t pretend to be a browser)
<html></html>
and not in IP - postman and scrapy send requests from the same IP, but with different success.
What is this happening?
Thanks

Answer the question

In order to leave comments, you need to log in

3 answer(s)
C
coderisimo, 2019-10-22
@coderisimo

Answer suggested on stackoverflow.
Bottom line: in the headings I copied thoughtlessly there was
after which I, of course, received compressed content. Hence the incomprehensible kryakozyabry. As soon as I removed this header, everything worked fine.

H
Headballz, 2019-10-21
@Headballz

Have you tried writing with codecs.open(file_name, 'w',"utf-8")?

T
Timokins, 2019-10-21
@timokins

The point, perhaps, is just in the headers
that scrapy uses by default,
for example, user-agent: scrapybot(completely clean).

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question