P
P
Phillip Gruy2013-06-18 11:52:13
Python
Phillip Gruy, 2013-06-18 11:52:13

Working with Element table in lxml

For personal needs, I needed a small report parser.
On Habré, just at that moment they wrote about LXML and how fast it works. I decided to try to make it beautiful.
As a result stuck with Element table.

PR=  xml  =lxml. html . document_fromstring ( open ( 'test.html' ) . read ( ) . decode ( 'utf-8' ) )
i =  1
while  True :
    table = PR. xpath ( '/html/body/table[' + str ( i ) + ']' )
    if  not  table:
        break
    i +=  1
    print  table

The result is a set of [<Element table at 0x********>]
The only thing I found was to do table[0].text_content() and get a lot of rows from the table.
I would like to beautifully get the headers from each table + an array (arrays) of only those columns from the table that I need in order to filter the necessary values ​​​​and display them in a readable form.

Thanks in advance.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
A
Alexey Akulovich, 2013-06-18
@AterCattus

Or you can take pyquery , which is a small wrapper on top of lxml, but write in a more, IMHO, convenient style of css selectors .

S
simbajoe, 2013-06-18
@simbajoe

Table seems to have an xpath method. See how they do it here: stackoverflow.com/questions/1577487/python-lxml-and-xpath-html-table-parsing

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question