How to find all tags and extract values from them?

V

vladimirsiy_centr2020-09-11 17:03:50

Python

vladimirsiy_centr, 2020-09-11 17:03:50

I wrote the parser code, but it will pull out only 1 picture on the page, the very first one, and on some pages there are 2-3 pictures with the same 'with-overtask' tag, you need to pull out all the tags and their values. How to do this?
code:

<div class="task-img-container">
                                                        <div class="with-overtask">
                                                        <img src="//gdz.ru/attachments/images/tasks/000/021/233/0002/5b48368e75cbc.jpg?d=0&s=OTA8NPuy_8Sq4EBpY-dK1Q" alt="ГДЗ по алгебре 8 класс  Мерзляк   номер - 930, Решебник" title="">
                                                            <div class="overtask"></div></div>
                                                </div>
                                                <div class="block-download-btn">
                            <a href="/download.html" data-img-src="//gdz.ru/attachments/images/tasks/000/021/233/0002/5b48368e75cbc.jpg?d=0&s=OTA8NPuy_8Sq4EBpY-dK1Q" class="download-btn js-download-btn">Скачать решение</a>
                          </div>
                                                              
                      
        
    
            <div id='media-11' class='media media-task-image'></div>
                                                                                                                    
                      <div class="task-img-container">
                                                        <div class="with-overtask">
                                                        <img src="//gdz.ru/attachments/images/tasks/000/021/233/0002/5b48368e76141.jpg?d=0&s=RrntuLSmcsrB_byl_jkhHA" alt="ГДЗ по алгебре 8 класс  Мерзляк   номер - 930, Решебник" title="">
                                                            <div class="overtask"></div></div>
                                                </div>
                                                <div class="block-download-btn">
                            <a href="/download.html" data-img-src="//gdz.ru/attachments/images/tasks/000/021/233/0002/5b48368e76141.jpg?d=0&s=RrntuLSmcsrB_byl_jkhHA" class="download-btn js-download-btn">Скачать решение</a>
                          </div>
                                                              
                      
        
    
            <div id='media-12' class='media media-task-image'></div>
                                                                                                                    
                      <div class="task-img-container">
                                                        <div class="with-overtask">
                                                        <img src="//gdz.ru/attachments/images/tasks/000/021/233/0002/5b48368e765c5.jpg?d=0&s=7iumA8y7yT08Dy7mCtpcKw" alt="ГДЗ по алгебре 8 класс  Мерзляк   номер - 930, Решебник" title="">
                                                            <div class="overtask"></div></div>
                                                </div>
                                                <div class="block-download-btn">
                            <a href="/download.html" data-img-src="//gdz.ru/attachments/images/tasks/000/021/233/0002/5b48368e765c5.jpg?d=0&s=7iumA8y7yT08Dy7mCtpcKw" class="download-btn js-download-btn">Скачать решение</a>
                          </div>

3
My parser code:.

intmes = int(message.text)
        listnum = range(1, 939)
        if intmes in listnum:
            per = str(intmes)
            URL = 'https://gdz.ru/class-8/algebra/merzlyak/' + per + '-nom'
            HEADERS = {
                'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36',
                'accept': '*/*'}

            def get_html(url, params=None):
                r = requests.get(url, headers=HEADERS, params=params)
                return r

            imgg = ''

            def get_content(html):
                global imgg
                soup = BeautifulSoup(html, 'html.parser')
                div = soup.find('div', {'class': 'with-overtask'})
                # if div != None:
                imgg = div.find('img')['src']
                imgg2 = 'https:' + imgg

                p = requests.get(imgg2)  # download
                out = open(r"C:\Users\vayak\PycharmProjects\pythonProject5\img.jpg", "wb")
                out.write(p.content)
                out.close()

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

G

Gennady S, 2020-09-11
@gscraft

See the BeautifulSoup documentation

div_elements = soup.find_all('div', {'class': 'with-overtask'})
for div in div_elements:
  imgg = div.find('img')['src'] # ... и т.д.