P
P
pproman2015-04-24 11:25:08
Parsing
pproman, 2015-04-24 11:25:08

How to implement parsing with enumeration of addresses?

Hello,
tell me how to implement:
There is a link like aaa.com/pic-123400000
on the received page there is a link to the author Serhii
Save all authors in the file by going through the last 5 digits of the link. (from 123400000 to 123500000) I
just started learning Python, I will be grateful for your help

Answer the question

In order to leave comments, you need to log in

3 answer(s)
A
Andrey Dugin, 2015-04-24
@adugin

1) Loop through all pages
2) Download each page
3) Parse the necessary data from there
4) Save to file

P
pproman, 2015-04-24
@pproman

on the received page there is a link to the author Serhii -
a href="/gallery-17656594590p1.html" itemprop="author">Serhii

A
Alexander, 2015-04-24
@NeiroNx

try:
  from urllib.request import Request, urlopen  # Python 3
except:
  from urllib2 import Request, urlopen  # Python 2
import os,re, base64
autors = {}
BROWSER = "Mozilla/5.0 Gecko/20100101 Firefox/36.0"
for i in range(123400000,123500000):
  s="http://aaa.com/pic-%d"%i
  autors[i] = re.findall(r'itemprop=\s?["\']?author["\']?\s?>(.+)<',urlopen(Request(s,None,{"User-Agent":BROWSER})).read())

In general, regular expressions can be tested online https://regex101.com/#python
I also recommend adding random delays, otherwise some sites may stop giving content considering this a DDoS attack.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question