H
H
hardwellZero2015-05-13 19:20:36
Python
hardwellZero, 2015-05-13 19:20:36

How to parse multiple pages?

Hello.
Please tell me how I can get certain data from the html page (knowing the element selector), but having 100+ pages. (ala google output).

Answer the question

In order to leave comments, you need to log in

4 answer(s)
L
lPolar, 2015-05-14
@lPolar

IMHO, urllib/requests/bs4 is the last century.
Take grab, it has excellent Russian documentation and a user-friendly interface.

R
Roman Kitaev, 2015-05-13
@deliro

requests + BeautifulSoup

D
Dmitry, 2015-05-13
@trec

urllib2 + BeautifulSoup
The algorithm is as follows (Google issuance), such a pseudo-code:
look at the issue page ,
take all 10 site urls, go
through them all,
opening each one and taking the necessary information using BeautifulSoup,
look at the address of the next Google page,
substitute it at the beginning of the program
And so we continue or until the end all found pages, or specify the desired viewing depth.

R
Roman, 2015-05-15
@skipirich

In order to iterate over this matryoshka, recursion is needed.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question