Answer the question
In order to leave comments, you need to log in
How to implement parsing with enumeration of addresses?
Hello,
tell me how to implement:
There is a link like aaa.com/pic-123400000
on the received page there is a link to the author Serhii
Save all authors in the file by going through the last 5 digits of the link. (from 123400000 to 123500000) I
just started learning Python, I will be grateful for your help
Answer the question
In order to leave comments, you need to log in
1) Loop through all pages
2) Download each page
3) Parse the necessary data from there
4) Save to file
on the received page there is a link to the author Serhii -
a href="/gallery-17656594590p1.html" itemprop="author">Serhii
try:
from urllib.request import Request, urlopen # Python 3
except:
from urllib2 import Request, urlopen # Python 2
import os,re, base64
autors = {}
BROWSER = "Mozilla/5.0 Gecko/20100101 Firefox/36.0"
for i in range(123400000,123500000):
s="http://aaa.com/pic-%d"%i
autors[i] = re.findall(r'itemprop=\s?["\']?author["\']?\s?>(.+)<',urlopen(Request(s,None,{"User-Agent":BROWSER})).read())
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question