Answer the question
In order to leave comments, you need to log in
How is re different from replace in Python 3?
So, I use a regular expression in the code and 12 pictures should be saved. At each launch, as many as you like are saved, but not 12. Why?
If I do it through the replace method, then everything works correctly.
Here is a link to the code
https://gist.github.com/kirussian911/8a14ab685b10e...
import urllib.request
import re
page_number = 1
def load_source(website):
site = urllib.request.urlopen(website)
read_site = site.read()
return read_site
def parse_img(source):
links = []
t = str(source)
pattern = '<img width="\d+" height="\d+" src="'
result = re.split(pattern, t)
# рабочий вариант через replace
# t = str(source).replace('550', ' ').replace('375', ' ').split('<img width=" " height=" " src="')
for i in result:
r = str(i).split('""')
links.append(r[0])
return links
def download(links):
name = 1
for i in links:
try:
v = urllib.request.urlopen(i)
f = open('Стр' + str(page_number) + 'номер' + str(name) + '.jpg', 'wb')
f.write(v.read())
f.close()
name += 1
except:
pass
def main():
print('start page: ')
print()
source = load_source('https://aliholic.com/shop/')
links = parse_img(source)
download(links)
print('Tnx')
if __name__=='__main__':
main()
Answer the question
In order to leave comments, you need to log in
>>> text = 'xxx <img width="550" height="550" src="link1" > yyy <img width="550" height="550" src="link2" /> zzz'
>>> re.findall(r'<img [^>]*src="([^"]*)"', text)
['link1', 'link2']
re is a whole module of the standard library. He can do everything, but you can’t drive up to him even on a crooked goat.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question