How to exclude a certain node from processing in grab for python?

T

Timofey Dergachev2014-10-29 23:27:37

Python

Timofey Dergachev, 2014-10-29 23:27:37

There is this code:

from grab import Grab

g = Grab()
g.go('http://habrahabr.ru/post/241889/')
xpath = '//div[contains(@class, "content_left")]//div[contains(@class, "content")]'
print(g.doc.select(xpath).html())

1. How to exclude, for example //div[contains(@class, "polling")],?
2. How to process two nodes? So //div | //spanonly the first one is processed.

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

T

Timofey Dergachev, 2014-11-02
@exeto

1. Solution:

from grab import Grab
from grab.tools.lxml_tools import drop_node

url = 'http://habrahabr.ru/post/241889/'
xpath = '//div[contains(@class, "content_left")]//div[contains(@class, "content")]'
drop = '//div[contains(@class, "polling")]'

g = Grab()
g.go(url)
page = g.doc.select(xpath)
drop_node(page.node(), drop)

for element in page:
    print(element.html())

2. I don’t know why I didn’t pay attention right away that it grab.doc.select()returns an iterable object. Here is the solution:

from grab import Grab

g = Grab()
g.go(url)
xpath = '//div | //span'

for element in g.doc.select(xpath):
    print(element.html())