Y
Y
yahabrovec2019-04-12 22:53:13
Python
yahabrovec, 2019-04-12 22:53:13

Python bs4 select desired element not having a class?

Hello. I was faced with the task of writing a small parser in python.
But there were problems with extracting the desired text. The trick is that the desired text is in a div element that has neither an ID nor a class. It is really difficult to get it without using crutches.
Maybe someone knows a more or less universal solution for such cases.
Here's one of the links
I'm talking about the div that contains the lyrics.

Answer the question

In order to leave comments, you need to log in

3 answer(s)
D
Dmitry Shitskov, 2019-04-12
@Zarom

Shit-making requires desperate measures.

soup.find("div", string="Usage of azlyrics.com content by any third-party lyrics provider is prohibited by our licensing agreement. Sorry about that.")

A
Andrey_Dolg, 2019-04-12
@Andrey_Dolg

Magic
XPath

D
DoyleArthur, 2019-05-09
@DoyleArthur

import requests
from bs4 import BeautifulSoup


def get_html(url):
    r = requests.get(url)
    return r.text


def get_data(html):
    soup = BeautifulSoup(html, 'lxml')
    divs = soup.find_all('div')
    return divs[21].text



def main():
    url = 'https://www.azlyrics.com/lyrics/imaginedragons/roots.html'
    print(get_data(get_html(url)))



if __name__ == '__main__':
    main()

This is how it turned out to parse, I checked it on other songs, it also parses, the layout is the same

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question