M
M
Morrdor2021-10-17 10:21:52
Python
Morrdor, 2021-10-17 10:21:52

How to parse using the BeautifulSoup4 Python library?

<div class="soundTitle sc-clearfix sc-hyphenate sc-type-h2 sc-text-h4 streamContext m-interactive"><div class="soundTitle__titleContainer">
    <div class="soundTitle__playButton">
        <a role="button" href="" class="snippetUXPlayButton sc-button-play playButton sc-button sc-button-xlarge" tabindex="0" title="Play" draggable="true">Play</a>
    </div>

  <div class="soundTitle__usernameTitleContainer">
      <div class="sc-type-light sc-text-secondary sc-text-h4 soundTitle__secondary">
        <a href="/gracedaviesofficial" class="soundTitle__username sc-link-secondary
             sc-link-light">
          <span class="soundTitle__usernameText">
              Grace Davies
          </span>
        </a>
              </div>
        <a class="sc-link-primary soundTitle__title sc-link-dark sc-text-h4" href="/gracedaviesofficial/hello-adele">
            <span class="">Hello - Adele</span>
        </a>
  </div>
  <div class="soundTitle__additionalContainer sc-ml-1.5x">
      <div class="soundTitle__uploadTime sc-mb-0.5x">
        <time class="relativeTime" title="Posted on 26 October 2015" datetime="2015-10-26T14:51:09.000Z"><span class="sc-visuallyhidden">Posted 6 years ago</span><span aria-hidden="true">6 years ago</span></time>
      </div>
      <div class="soundTitle__tagContainer">
          <span class="sc-snippet-badge sc-selection-disabled sc-snippet-badge-medium sc-snippet-badge-grey sc-hidden"></span>
          <span class="sc-snippet-badge sc-selection-disabled sc-snippet-badge-small sc-snippet-badge-grey sc-hidden"></span>
          <a class="sc-tag soundTitle__tag sc-tag-small" href="/tags/hello"><span class="sc-truncate sc-tagContent">hello</span></a>
      </div>
  </div>
</div>
</div>


How do I get the span with the title of the song, and the href from the a tag that this span is nested in.

I tried to use find_all('a', attrs={'class name':'class value'}[0].get('href') but I got an index out of range error. That is, the element was not found.

Span also don't know how to get He has no class

Answer the question

In order to leave comments, you need to log in

2 answer(s)
S
soremix, 2021-10-17
@morrdor

from bs4 import BeautifulSoup

html = '''
<div class="soundTitle sc-clearfix sc-hyphenate sc-type-h2 sc-text-h4 streamContext m-interactive"><div class="soundTitle__titleContainer">
    <div class="soundTitle__playButton">
        <a role="button" href="" class="snippetUXPlayButton sc-button-play playButton sc-button sc-button-xlarge" tabindex="0" title="Play" draggable="true">Play</a>
    </div>

  <div class="soundTitle__usernameTitleContainer">
      <div class="sc-type-light sc-text-secondary sc-text-h4 soundTitle__secondary">
        <a href="/gracedaviesofficial" class="soundTitle__username sc-link-secondary
             sc-link-light">
          <span class="soundTitle__usernameText">
              Grace Davies
          </span>
        </a>
              </div>
        <a class="sc-link-primary soundTitle__title sc-link-dark sc-text-h4" href="/gracedaviesofficial/hello-adele">
            <span class="">Hello - Adele</span>
        </a>
  </div>
  <div class="soundTitle__additionalContainer sc-ml-1.5x">
      <div class="soundTitle__uploadTime sc-mb-0.5x">
        <time class="relativeTime" title="Posted on 26 October 2015" datetime="2015-10-26T14:51:09.000Z"><span class="sc-visuallyhidden">Posted 6 years ago</span><span aria-hidden="true">6 years ago</span></time>
      </div>
      <div class="soundTitle__tagContainer">
          <span class="sc-snippet-badge sc-selection-disabled sc-snippet-badge-medium sc-snippet-badge-grey sc-hidden"></span>
          <span class="sc-snippet-badge sc-selection-disabled sc-snippet-badge-small sc-snippet-badge-grey sc-hidden"></span>
          <a class="sc-tag soundTitle__tag sc-tag-small" href="/tags/hello"><span class="sc-truncate sc-tagContent">hello</span></a>
      </div>
  </div>
</div>
</div>
'''

soup = BeautifulSoup(html, 'html.parser')

a_tags = soup.find_all('a', {'class': 'soundTitle__title'})

for a_tag in a_tags:
    print(a_tag['href'], a_tag.find('span').text)

A
alexbprofit, 2021-10-17
@alexbprofit

js generated content

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question