How to get href value of 'a' tag using BeautifulSoup Python?

T

try1002017-03-19 18:06:14

Python

try100, 2017-03-19 18:06:14

Help with the following task: there is an html page that contains two links to two images, respectively. These links are exactly what you need to get. I'm trying the following code:

def parse(html):
  soup = BeautifulSoup(html,'html.parser')
  title = soup.find('h1')
  image1 = soup.find('div', {'class': 'text'}).find('a').get('href')
        image2 = soup.find_all('a', class_='highslide')[1]
    
  post = []

  post.append ({
    'title': title.text,
    'image1': image1,
    'image2': image2,
    
    })
  print(post)

The image1 option gives me a link, but if I use it, how do I get the second link?
The image2 option gives me all the attributes of the a tag , but how do I get just the href?
Help make this task a reality!

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

T

try100, 2017-03-26
@try100

Thank you all for your help! Resolved my issue like this:

def parse(html):
  soup = BeautifulSoup(html,'html.parser')
  title = soup.find('h1')
  image1 = soup.find_all('a', class_='highslide')[0]
  image2 = soup.find_all('a', class_='highslide')[1]
    
  post = []

  post.append ({
    'title': title.text,
    'image1': image1.get('href'),
    'image2': image2.get('href'),
    })
  print(post)

A

AtomKrieg, 2017-03-20
@AtomKrieg

stackoverflow.com/questions/5815747/beautifulsoup-...

N

Nilsoner, 2017-03-20
@Nilsoner

You can try like this:

post = []
soup = BeautifulSoup(html,'html.parser')
title = soup.find('h1')
image_links = soup.find_all('div', {'class': 'text'})
post.append ('title')
for link in image_links:
    link = link.find('a').get('href')
    post.append('link')