How to pull out all text from all divs with the same class?

O

onepunchman4042020-05-04 23:56:33

Python

onepunchman404, 2020-05-04 23:56:33

There is a parser that takes information from a clothing store. So I need to take the sizes, all the sizes are scattered over several divs with the same class, how can I parse them all? I use the BeautifulSoup and requests libraries. I understand that you can do something like this:
item.find_all[0]('a', class_='products-list-item__size-item link') + item.find_all[1]('a', class_= 'products-list-item__size-item link') ,
but we need a more universal option that would itself determine the number of these sizes and take them all

def get_content(html):
    soup = BeautifulSoup(html, 'html.parser')
    items = soup.find_all('div', class_='products-list-item')

    link = []
    for item in items:
        link.append({
            'Link': HOST + item.find('a', class_='link').get('href'),
            'Size': item.find_all('a', class_='products-list-item__size-item link')
        })
    print(link)
    return link


Сейчас выдает вот эту кашу 'Size': [<a class="products-list-item__size-item link" data-link="/p/ma178ewyyl56/clothes-marksspencer-komplekt/?sku=ma178ewyyl56b100">44</a>, <a class="products-list-item__size-item link" data-link="/p/ma178ewyyl56/clothes-marksspencer-komplekt/?sku=ma178ewyyl56b120">46</a> ......

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

A

Amigun, 2020-05-05
@Amigun

Through requests we get the desired page: I hope you know how this is done. Next up is bs4. Now let's create a for loop to iterate over ALL (you wanted this, right?) tags, and we will pull out the text from them if it has the class we need.
response = requests.get('тут_ваша_ссылка')
soup = BeautifulSoup(response.content, 'lxml')

for i in soup.recursiveChildGenerator():
    if i.name:
      if i.name == 'a':  # У вас на скрине показан тег a
        try:
          if i.attrs['class']:
            if i.attrs['class'] == 'нужный_вам_тег':
            # Если выше не работает, то попробуйте следующий вариант
            # if i.attrs['class'] == ['нужный_вам_тег']:
              return i.text
            else:
              pass
        except KeyError:
          pass

In theory this should work, but I haven't tested it on your site.
Try it, do it.
For more details, you can read here.