How to parse specific text using lxml?

V

Vasily Terkin2017-10-05 02:27:50

Python

Vasily Terkin, 2017-10-05 02:27:50

Guys problem.
You need to parse the text of an article on one site, the article has the following html structure:

<div>
  <p>Нужный текст</p>
  <p>Нужный текст</p>
  <aside>Ненужный элемент</aside>
  Нужный текст
  <p>Нужный текст</p>
  <p>Нужный текст</p>	
</div>

I use python+lxml
The fact is that the site is crookedly laid out, and the text I need is located without a tag, but strictly after the aside tag.
How can I catch all the required text?
I tried to remove unnecessary nodes and parse through xpath("//div//text()") but everything appears, except for the one after the aside tag.
Any ideas?

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

R

Roman Fov, 2017-10-05
@VasyanPro94

This is how you can

/div/aside/following-sibling::text()[normalize-space(.) != '']