M
M
maxvik2016-10-24 19:02:51
Python
maxvik, 2016-10-24 19:02:51

How to parse dynamic web sites using Python 3?

Task: to parse websites and collect information from them about the availability of a certain product model and its cost.
Purpose: to get a CSV file with the fields Firm seller \ Brand \ Model \ Price
In general, I coped with the task, one BUT arose and it is as follows, some sites when using:
requests.get ('url')
do not return the entire code of the page , as I understand it, the fact is that part of the code is generated by javascript.
how to get full page code?
If possible, provide code examples that implement getting all the code from the web page
PS: Please help me, I can't stand the fourth night in the company with Google :(

Answer the question

In order to leave comments, you need to log in

4 answer(s)
D
DannyFork, 2016-12-18
@DannyFork

I use a bunch of Selenium and PhantomJs or Chrome.
It pre-renders the page d in the browser, which allows you to parse any dynamic pages.
Youtube parsing example

from selenium import webdriver

driver = webdriver.Chrome()
driver.get("http://www.youtube.com/results?search_query=" + "guitar+lessons")

results = driver.find_elements_by_xpath('//div[@class="yt-lockup-content"]')

print(len(results))

for result in results:
    video = result.find_element_by_xpath('.//h3/a')
    title = video.get_attribute('title')
    url = video.get_attribute('href')
    print("{} ({})".format(title, url))
driver.quit()

Result:
Guitar Lessons for Beginners in 21 days #1 | How to play guitar for beginners (https://www.youtube.com/watch?v=orp7WHibnaU)
GuitarLessons.com (https://www.youtube.com/user/guitarlessonscom)
Play TEN guitar songs with two EASY chords | Beginners first guitar lesson (https://www.youtube.com/watch?v=Jg-BRpn38L8)
....more

S
sim3x, 2016-10-24
@sim3x

You don't need to get the whole page
Find how the data you need is generated and request it

D
Dimonchik, 2016-10-24
@dimonchik2013

Scrapy if in an adult way, as well as requests, as sim3x advises, is enough

A
asd111, 2016-10-25
@asd111

There are two ways.
1. In Chrome, press F12, go to the Network tab, reload the page and see where it is loading from.
2. Take Selenium for python and use phantomjs as a driver stackoverflow.com/questions/13287490/is-there-a-wa...

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question