D
D
Denis99992015-12-21 07:01:10
Python
Denis9999, 2015-12-21 07:01:10

Why in python 3 lxml.html.document_fromstring doesn't output what it should?

In all examples, the code below outputs 'Hello World', but mine:
<Element html at 0x2ab9540>
Can you please tell me what's the problem here?

data = """<html>
<head>
</head>
<body>Привет мир</body>
</html>"""
html = lxml.html.document_fromstring(data)
print (html)

Answer the question

In order to leave comments, you need to log in

3 answer(s)
S
sim3x, 2015-12-21
@sim3x

>>> import lxml.html
>>> html = lxml.html.fromstring('''\
...    <html><body onload="" color="white">
...      <p>Hi  !</p>
...    </body></html>
... ''')

>>> print lxml.html.tostring(html)
<html><body onload="" color="white"><p>Hi !</p></body></html>

>>> print lxml.html.tostring(html)
<html> <body color="white" onload=""> <p>Hi    !</p> </body> </html>

>>> print lxml.html.tostring(html)
<html>
  <body color="white" onload="">
    <p>Hi !</p>
  </body>
</html>

B
belanchuk, 2015-12-21
@belanchuk

1. Use unicode.
2. Refer to text by tags:

from lxml import html
data = u"""<html>
<head>
</head>
<body>Привет мир</body>
</html>"""
html = html.document_fromstring(data)
print html.head.body.text
In [1]: Привет мир

A
abcd0x00, 2015-12-23
@abcd0x00

In all examples, the code below outputs 'Hello world'

It can not be. You get everything right.
Either the example was incorrectly rewritten, or the example is incorrect.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question