G
G
gowa662016-06-14 15:54:39
Python
gowa66, 2016-06-14 15:54:39

Encoding Chinese characters when parsing?

I am writing a parser for a Chinese online store.

from urllib.request import urlopen
from urllib.parse import urljoin
from lxml.html import fromstring

URL = 'http://list.suning.com/0-258003-0.html'
ITEM_PATH = '.clearfix .product .border-out .border-in .wrap .res-info .sell-point'

def parse_items():
    f = urlopen(URL)
    list_html = f.read().decode('utf-8')
    list_doc = fromstring(list_html)
    for elem in list_doc.cssselect(ITEM_PATH):
        a = elem.cssselect('a')[0]
        href = a.get('href')
        title = a.text
        em = elem.cssselect('em')[0]
        title2 = em.text
        print(href, title, title2)

def main():
    parse_items()

if __name__ == '__main__':
    main()

I get an error
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)

Who can explain the coding?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
F
Fixid, 2016-06-14
@gowa66

Change to python3, there all strings are originally Unicode. If you stay on python2, then you can’t use str, there are alternative methods for working with strings on the Internet
Try without decode
And show the type of the object that causes an error

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question