lxml parsing library

I

IgaIst2013-08-13 20:15:25

Python

IgaIst, 2013-08-13 20:15:25

The problem with parsing the "link" tag in the lxml library The

actual code:

import lxml.html
xml = '<link>trololo</link>'
doc = lxml.html.document_fromstring(xml)
out = doc.cssselect('link')[0]
print out.text

Everything is done, but at the output we get:
None

If we replace the “link” tag with any other, then the problem disappears.

Actually: I'm at a loss! Has anyone encountered something similar?
Or can someone similar (simple, small, light) liba advise?

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

I

IgaIst, 2013-08-14
@IgaIst

syschel prompted a very good idea: I'm parsing xml with an html module)
Solution:

from lxml import etree
doc = etree.XML('<link>trololo</link>')
out = doc.xpath('/link')[0].text
print out

R

Ramires, 2013-08-13
@Ramires

The same will happen if you replace link with br or img.
I think the point is that the link, br, img tags are single by standard, but here they are presented in pairs.