N
N
Nikolai2020-10-20 12:33:31
Python
Nikolai, 2020-10-20 12:33:31

How to get data from the site page for subsequent parsing?

I want to parse data about products with markdown galaxystore . I ran into a problem at the stage of getting the htmlpage. I use the classic library request:

url = 'https://galaxystore.ru/discount/'
response = requests.get(url).text
print(response)

When sending a request, I get html in this format 5f8ead7127bd8173760289.jpeg. In general, when substituting any headers, I get a response in this encrypted form.

What I tried:

  1. urllib
  2. As far as I understand, the required content is loaded by ajax, so with the help of the developer tools I found the required request, but when I turned to it, I received the same encrypted content
  3. Changed User agent
  4. I tried to withwrite down the answer with the help, and then open it for further parsing
  5. On the Internet, people advise selenium. Of course, I tried it and I managed to get to the desired content, but this is a testing tool, rather slow and not suitable for my task.


At the moment I'm still in the process of learning and this will be my first project related to parsing, so I want to deal with the problem that has arisen.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
S
shurshur, 2020-10-20
@shurshur

This site sets cookies using this js, then redirects to itself again, while with the correct cookie it already produces normal content.
The main logic is here:

salt="1524556899";
document.cookie="ipp_sign="+e+"_"+salt+"_"+md5(e+salt)+"; expires=Tue, 31 Dec 2030 23:59:59 GMT; path=/;";
ipp.setCookie();
window.location.href = "https://galaxystore.ru/discount/?utm_referrer=" + window.location.hash;

Accordingly, the site calculates a fingerprint, by which it identifies the user, then salts it and counts md5. You can try to take something random, similar to this fingerprint, and reproduce the same logic. But with active traffic to this page, they may start to fight it, including banning by IP, making some minor changes to the algorithm, etc., etc.
PS: To the question of the ethical side. We all love to get something cheaper sometimes, it's not a sin. But sometimes there are figures who want to buy cheap in dozens, then sell more expensive to some suckers and profit from it. It is they who like to make such parsers. This is ugly, I would refuse such an order.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question