How is re different from replace in Python 3?

K

kirussian2018-08-26 03:56:04

Python

kirussian, 2018-08-26 03:56:04

So, I use a regular expression in the code and 12 pictures should be saved. At each launch, as many as you like are saved, but not 12. Why?
If I do it through the replace method, then everything works correctly.
Here is a link to the code
https://gist.github.com/kirussian911/8a14ab685b10e...

spoiler

import urllib.request
import re

page_number = 1
def load_source(website):
    site = urllib.request.urlopen(website)
    read_site = site.read()
    return read_site


def parse_img(source):
    links = []
    t = str(source)
    pattern = '<img width="\d+" height="\d+" src="'
    result = re.split(pattern, t)
    # рабочий вариант через replace
    # t = str(source).replace('550', ' ').replace('375', ' ').split('<img width=" " height=" " src="')

    for i in result:
        r = str(i).split('""')
        links.append(r[0])
    return links

def download(links):
    name = 1
    for i in links:
        try:
            v = urllib.request.urlopen(i)
            f = open('Стр' + str(page_number) + 'номер' +  str(name) +  '.jpg', 'wb')
            f.write(v.read())
            f.close()
            name += 1
        except:
            pass

def main():
    print('start page: ')
    print()
    source = load_source('https://aliholic.com/shop/')
    links = parse_img(source)
    download(links)
    print('Tnx')


if __name__=='__main__':
    main()

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

L

lega, 2018-08-26
@lega

>>> text = 'xxx <img width="550" height="550" src="link1" > yyy <img width="550" height="550" src="link2" /> zzz'
>>> re.findall(r'<img [^>]*src="([^"]*)"', text)
['link1', 'link2']

B

Barafu_Albino_Cheetah, 2018-08-26
@Barafu_Albino_Cheetah

re is a whole module of the standard library. He can do everything, but you can’t drive up to him even on a crooked goat.