Working with Element table in lxml

P

Phillip Gruy2013-06-18 11:52:13

Python

Phillip Gruy, 2013-06-18 11:52:13

For personal needs, I needed a small report parser.
On Habré, just at that moment they wrote about LXML and how fast it works. I decided to try to make it beautiful.
As a result stuck with Element table.

PR= xml =lxml. html . document_fromstring ( open ( 'test.html' ) . read ( ) . decode ( 'utf-8' ) )
i = 1
while True :
    table = PR. xpath ( '/html/body/table[' + str ( i ) + ']' )
    if not table:
        break
    i += 1
    print table

The result is a set of [<Element table at 0x********>]
The only thing I found was to do table[0].text_content() and get a lot of rows from the table.
I would like to beautifully get the headers from each table + an array (arrays) of only those columns from the table that I need in order to filter the necessary values and display them in a readable form.

Thanks in advance.

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

A

Alexey Akulovich, 2013-06-18
@AterCattus

Or you can take pyquery , which is a small wrapper on top of lxml, but write in a more, IMHO, convenient style of css selectors .

S

simbajoe, 2013-06-18
@simbajoe

Table seems to have an xpath method. See how they do it here: stackoverflow.com/questions/1577487/python-lxml-and-xpath-html-table-parsing