Answer the question
In order to leave comments, you need to log in
Selenium is the only way to parse dynamic web pages?
Hello!
There was a task - to parse one online store, but that's bad luck: when you get-request for the URL of the product I need, a JS script arrives, and not an HTML page with information. Based on this script, other requests are generated, based on which the HTML page will be built.
I work with Python and, to be honest, I'm not very good at JavaScript, so it's hard for me to understand exactly how the HTML I need is going to be assembled ..
So far, I have come to this solution: raise the browser in headless mode through Selenium, refresh the page through timeouts until the necessary elements appear in the DOM, which I will be tied to. This is very long and not very reliable, and the service that I am making involves receiving data on request immediately after entering a link to the product, so a solution with simple http requests to the store server would be ideal here, rather than dancing with Selenium.
Here is an example of a product from the store I need: https://shop.nordstrom.com/s/natori-pure-luxe-unde...
Yes, the store is not accessible from many countries, so it's better to go through an American or Singapore proxy.
My task is to get ready-made HTML with information about this product and parse it already. What do you advise? Is it possible to implement this bypassing Selenium?
Z.Y. I've already been to Google (I've been scouring there for 2 days, so far to no avail). Please tell us how you solve dynamic page parsing problems.
Answer the question
In order to leave comments, you need to log in
It is necessary to establish which request receives the necessary data and execute it. If a special session parameter is passed there - calculated in JS, the incoming data can also be encrypted. Then only selenium.
There is also a theory that there is not complicated JS and it can be translated into python partially or completely - to generate the necessary keys and decrypt the data, but brains are already needed here.
The link says "Access Denied" - that's why I don't know.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question