Why don't class methods work in a loop?

U

Uno di Palermo2021-09-10 11:31:33

Python

Uno di Palermo, 2021-09-10 11:31:33

Greetings, please, I ask those who can - help. Bottom line: there is a dictionary of, say, 5 links, according to which 3 class methods must be run in a cycle: 1. create a folder with a direction (if there is none) and a folder with the name of the artist in it, 2. create a .log file with links to paintings

logging.basicConfig(filename=f'{path}/{school}/{artist_name}/list_of_all_works_of_{artist_name}.log', level=logging.INFO, format=FORMAT)

,
3. download pictures, links to which are recorded in this file

Like this:

for i in ar_deco:
    w = Wikiart()
    w.create_folder(i)
    w.get_list_of_all_works(i)
    w.download_images(i)
    del w

If there is 1 link in the dictionary, it works. If more - the second gives an error

Traceback (most recent call last):
  File "G:\Desktop\py\wikiart\wikiart.py", line 285, in <module>
    w.download_images(i)
  File "G:\Desktop\py\wikiart\wikiart.py", line 227, in download_images
    f = open(f'{path}/{school}/{artist_name}/list_of_all_works_of_{artist_name}.log', 'r').readlines()
FileNotFoundError: [Errno 2] No such file or directory: 'G:/Desktop/py/wikiart/Экспрессионизм/Erin Hanson/list_of_all_works_of_Erin Hanson.log'

those. w.create_folder(i) and w.download_images(i) methods are executed, but not w.get_list_of_all_works(i).

The code itself:

import requests
from bs4 import BeautifulSoup as bs
import re, os, sys,
import logging
from wget import download

ar_deco = [
    "https://www.wikiart.org/ru/francois-pompon/all-works/text-list",
    "https://www.wikiart.org/ru/aleksandra-ekster/all-works/text-list"
]

FORMAT = '%(message)s'

path = os.path.abspath(os.path.dirname(sys.argv[0])).replace('\\', '/')

BASE_URL = 'https://wikiart.org'

headers = {
    "user-agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36"
}

class Wikiart:

    def get_soup(self, url):
        session = requests.Session()
        r = session.get(url, headers=headers)
        soup = bs(r.content, 'html.parser')

        return soup

    def get_artist_name(self, url):
        soup = self.get_soup(url)
        artist_name = soup.find('a', class_='artist-href').text.replace(':', '')

        return artist_name.strip()

    def get_school(self, url):
        soup = self.get_soup(url)
        school = soup.find('div', class_='wiki-breadcrumbs-links').find_all('a')[2].text.strip()

        return school.strip()

    def create_folder(self, url):
        artist_name = self.get_artist_name(url)
        school = self.get_school(url)

        if os.path.exists(f'{path}/{school}/{artist_name}'):
            print(f'Exists: {path}/{school}/{artist_name}')
            sys.exit()

        if not os.path.exists(f'{path}/{school}'):
            os.mkdir(f'{path}/{school}')
            print(f'Created: {path}/{school}')

        if not os.path.exists(f'{path}/{school}/{artist_name}'):
            os.mkdir(f'{path}/{school}/{artist_name}')
            print(f'Created: {path}/{school}/{artist_name}')


    def get_list_of_all_works(self, url):
        artist_name = self.get_artist_name(url)
        school = self.get_school(url)

        logging.basicConfig(
            filename=f'{path}/{school}/{artist_name}/list_of_all_works_of_{artist_name}.log', 
            level=logging.INFO, 
            format=FORMAT
        )
        
        soup = self.get_soup(url)
        arts = soup.find_all('li', class_='painting-list-text-row')

        for link in arts:
            img = BASE_URL + link.a['href']
            title = link.text.replace(', ?', '')
            logging.info(img)

        return

    def download_images(self, url):
        artist_name = self.get_artist_name(url)
        school = self.get_school(url)
        
        f = open(f'{path}/{school}/{artist_name}/list_of_all_works_of_{artist_name}.log', 'r').readlines()

        num_of_lines = sum(1 for _ in f)
        n = 0

        forbidden_symbols = ('*,<>:\'\\"/\|?=')

        try: 
            for _ in f:
                soup = self.get_soup(_.strip())
            
                try:
                    img = soup.find('img', itemprop='image')['src']
                except:
                    pass

                try:
                    title = soup.find('div', class_='wiki-breadcrumbs wiki-breadcrumbs-artwork'). \
                                 find_all('a')[5].text. \
                                 replace('"', '_')
                except:
                    pass

                session = requests.Session()
                try:
                    img_r_ = session.get(img)
                except Exception as e:
                    print(e)
                    continue 

                con = img_r_.content

                file_name = f'{path}/{school}/{artist_name}/{title}_{n}.jpg'
                
                try:
                    outf = open(file_name, "wb")
                    outf.write(con)
                    outf.close()
                except:
                    pass

                print(f'{img} : {title} ({n} from {num_of_lines})') 
                
                n += 1

        except Exception as e:
            raise(e)
            pass 


for i in ar_deco:
    w = Wikiart()
    
    w.create_folder(i)
    w.get_list_of_all_works(i)
    w.download_images(i)

    del w

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

V

Vindicar, 2021-09-10
@rogerCopy

Uno di Palermo , the key word is "should".
You have chosen logging for this purpose in vain. It, for example, may not create a file if there were no actual entries in the log.
Now, if you used the usual open () and wrote the lines yourself, it would work much better.
I'm silent about the fact that the file may not open

because the disk is full,
since the directory is write-protected,
because path, school or artist_name contained characters not allowed to be used in the path
because some other program deleted the file between creation and opening
because some other program opened the file in exclusive mode
and for many more reasons

So it's even better to just take note that the file open operation may fail anyway. And write the program accordingly. try-catch IOError block to the rescue.
Also, why recreate wikiart() in a loop?

W

Wispik, 2021-09-10
@Wispik

Traceback (most recent call last):
File "G:\Desktop\py\wikiart\wikiart.py", line 285, in
w.download_images(i)
File "G:\Desktop\py\wikiart\wikiart.py", line 227, in download_images
f = open(f'{path}/{school}/{artist_name}/list_of_all_works_of_{artist_name}.log', 'r').readlines()
FileNotFoundError: [Errno 2] No such file or directory : 'G:/Desktop/py/wikiart/Expressionism/Erin Hanson/list_of_all_works_of_Erin Hanson.log'

The error is that you are trying to open a READ file that does not exist.