Answer the question
In order to leave comments, you need to log in
How do I use BS4 to extract the text inside a tag that is inside the found tag?
Hi Habr!
<LTTextBoxHorizontal y0="677.457" y1="698.418" x0="47.208" x1="86.76" width="39.552" height="20.962" bbox="[47.208, 677.457, 86.76, 698.418]" index="3">
<LTTextLineHorizontal y0="688.668" y1="698.418" x0="53.168" x1="86.76" width="33.592" height="9.75" bbox="[53.168, 688.668, 86.76, 698.418]" word_margin="0.1">Patient </LTTextLineHorizontal>
<LTTextLineHorizontal y0="677.457" y1="687.207" x0="47.208" x1="86.76" width="39.552" height="9.75" bbox="[47.208, 677.457, 86.76, 687.207]" word_margin="0.1">Jonson </LTTextLineHorizontal>
</LTTextBoxHorizontal>
Answer the question
In order to leave comments, you need to log in
Just like with regular html, read the documentation
soup = BeautifulSoup(xml, 'xml')
box = soup.find('LTTextBoxHorizontal')
for line in box.find_all('LTTextLineHorizontal'):
print(line.text)
html = '''<LTTextBoxHorizontal y0="677.457" y1="698.418" x0="47.208" x1="86.76" width="39.552" height="20.962" bbox="[47.208, 677.457, 86.76, 698.418]" index="3">
<LTTextLineHorizontal y0="688.668" y1="698.418" x0="53.168" x1="86.76" width="33.592" height="9.75" bbox="[53.168, 688.668, 86.76, 698.418]" word_margin="0.1">Patient </LTTextLineHorizontal>
<LTTextLineHorizontal y0="677.457" y1="687.207" x0="47.208" x1="86.76" width="39.552" height="9.75" bbox="[47.208, 677.457, 86.76, 687.207]" word_margin="0.1">Jonson </LTTextLineHorizontal>
</LTTextBoxHorizontal>'''
import bs4
text = bs4.BeautifulSoup( html, 'html.parser' ).get_text()
>> print( text )
Patient
Jonson
>> text
'\nPatient \nJonson \n'
import re
text = re.sub( r'<[^>]+>', '', html)
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question