Answer the question
In order to leave comments, you need to log in
How to parse text in a div, ignoring nested tags, BeautifulSoup?
How to parse part of a div like this:
<div class="example">
<p>bla-bla-bla</p>
<div>something not important</div>
<strong>SomeText</strong>
<br>
Нужный текст
<span style="color:red">Тоже нужный текст</span>
Нужный текст
</div>
Answer the question
In order to leave comments, you need to log in
One of the options how to remove the excess:
from bs4 import BeautifulSoup
html_doc = """
<div class="example">
<p>bla-bla-bla</p>
<div>something not important</div>
<strong>SomeText</strong>
<br>
Нужный текст
<span style="color:red">Тоже нужный текст</span>
Нужный текст
</div>
"""
soup = BeautifulSoup(html_doc)
tag = soup.find("div", class_="example")
tag.div.decompose() # убираем вложенный div
tag.p.decompose() # убираем текст в теге <p>
tag.br.decompose() # убираем перенос <br>
print(tag)
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question