I
I
IgaIst2013-08-13 20:15:25
Python
IgaIst, 2013-08-13 20:15:25

lxml parsing library

The problem with parsing the "link" tag in the lxml library The

actual code:

import lxml.html
xml = '<link>trololo</link>'
doc = lxml.html.document_fromstring(xml)
out = doc.cssselect('link')[0]
print out.text


Everything is done, but at the output we get:
None

If we replace the “link” tag with any other, then the problem disappears.

Actually: I'm at a loss! Has anyone encountered something similar?
Or can someone similar (simple, small, light) liba advise?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
I
IgaIst, 2013-08-14
@IgaIst

syschel prompted a very good idea: I'm parsing xml with an html module)
Solution:

from lxml import etree
doc = etree.XML('<link>trololo</link>')
out = doc.xpath('/link')[0].text
print out

R
Ramires, 2013-08-13
@Ramires

The same will happen if you replace link with br or img.
I think the point is that the link, br, img tags are single by standard, but here they are presented in pairs.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question