N
N
nazandr2016-11-02 02:39:59
Python
nazandr, 2016-11-02 02:39:59

How to decode unicode characters from a list?

Initially there is a generated csv file

for i in range(n):
  writer.writerow({'title': f.entries[i].title.encode('utf-8'), 'link': f.entries[i].link.encode('utf-8')})

then we read it and parse it into a list, but it is with encoded characters
vocabulary = open('/Users/andrey/Projects/News-parser/vocabulary.csv').read().lower()

chars = sorted(list(set(vocabulary)))

['\n', '\r', ' ', '"', '#', '&', '(', ')', '+', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', '=', '?', '[', ']', '_', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 
'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '|', '\x80', '\x81', '\x82', '\x83', '\x84', '\x85', '\x86', '\x87', '\x88', '\x89', '\x8b', '\x8c', '\x8d', '\x8e', '\x8f', 
'\x90', '\x91', '\x92', '\x93', '\x94', '\x97', '\x98', '\x9a', '\x9b', '\x9c', '\x9d', '\x9e', '\x9f', '\xa0', '\xa1', '\xa2', '\xa3', '\xa4', '\xa5', '\xa6', 
'\xa7', '\xa8', '\xab', '\xad', '\xaf', '\xb0', '\xb1', '\xb2', '\xb3', '\xb4', '\xb5', '\xb6', '\xb7', '\xb8', '\xb9', '\xba', '\xbb', '\xbc', '\xbd', '\xbe', '\xbf', '\xc2', '\xd0', '\xd1', '\xe2']

Answer the question

In order to leave comments, you need to log in

1 answer(s)
M
Maxim Vasiliev, 2016-11-02
@qmax

for example, specify the file encoding when opening.

open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question