Python bs4 select desired element not having a class?

Y

yahabrovec2019-04-12 22:53:13

Python

yahabrovec, 2019-04-12 22:53:13

Hello. I was faced with the task of writing a small parser in python.
But there were problems with extracting the desired text. The trick is that the desired text is in a div element that has neither an ID nor a class. It is really difficult to get it without using crutches.
Maybe someone knows a more or less universal solution for such cases.
Here's one of the links
I'm talking about the div that contains the lyrics.

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

D

Dmitry Shitskov, 2019-04-12
@Zarom

Shit-making requires desperate measures.

soup.find("div", string="Usage of azlyrics.com content by any third-party lyrics provider is prohibited by our licensing agreement. Sorry about that.")

A

Andrey_Dolg, 2019-04-12
@Andrey_Dolg

Magic

XPath

D

DoyleArthur, 2019-05-09
@DoyleArthur

import requests
from bs4 import BeautifulSoup


def get_html(url):
    r = requests.get(url)
    return r.text


def get_data(html):
    soup = BeautifulSoup(html, 'lxml')
    divs = soup.find_all('div')
    return divs[21].text



def main():
    url = 'https://www.azlyrics.com/lyrics/imaginedragons/roots.html'
    print(get_data(get_html(url)))



if __name__ == '__main__':
    main()

This is how it turned out to parse, I checked it on other songs, it also parses, the layout is the same