V
V
Vasily Terkin2017-10-05 02:27:50
Python
Vasily Terkin, 2017-10-05 02:27:50

How to parse specific text using lxml?

Guys problem.
You need to parse the text of an article on one site, the article has the following html structure:

<div>
  <p>Нужный текст</p>
  <p>Нужный текст</p>
  <aside>Ненужный элемент</aside>
  Нужный текст
  <p>Нужный текст</p>
  <p>Нужный текст</p>	
</div>

I use python+lxml
The fact is that the site is crookedly laid out, and the text I need is located without a tag, but strictly after the aside tag.
How can I catch all the required text?
I tried to remove unnecessary nodes and parse through xpath("//div//text()") but everything appears, except for the one after the aside tag.
Any ideas?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
R
Roman Fov, 2017-10-05
@VasyanPro94

This is how you can

/div/aside/following-sibling::text()[normalize-space(.) != '']

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question