Answer the question
In order to leave comments, you need to log in
Parsing a product card and its result in real time?
Task description: on a web page with a field under the url and a button, we get the result of parsing (name, price, size/color options) of the entered address. For example, parsing a product card on the taobao website.
Additional task: get the source of the page with the already executed js.
Platform: Ubuntu 16.04 under VirtualBox (4gb under RAM and maximum CPU, KVM virtualization) on a home PC (Internet 100Mbps)
Attempts to implement on: PhantomJS, Selenium (with various drivers), Scrapy, Beautuful soup (with PyQt4 + xvfb and without them).
Test sites: Taobao.com and Dns-shop.ru
Results:It works, but the same script with the same url can be executed before receiving the results for both 5 seconds and up to 5 minutes. And of course, not only on the addresses of Taobao goods, but also on the DNS-shop and other Russian ones.
5 seconds to execute the script can still be tolerated, but if more, then there is simply no point. Yes, I know that taobao has an API, but if such an option were available, then I would not turn to parsing.
How to overcome such a long delay? Or what other options are there?
With a direct request to the address from where TaoBao loads the data I need about the price / options, etc. it did not work out (brains are not enough) to figure it out. When I access this url I get 403 error.
Answer the question
In order to leave comments, you need to log in
will be executed before receiving the results both 5 seconds and up to 5 minutes
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question