Python throws a UnicodeEncodeError

S

stenhot2013-11-29 21:11:42

Python

stenhot, 2013-11-29 21:11:42

I've only been learning Python for two days. And I was given the task of parsing Catalog.xml
I myself wrote a parser for it, but when it works, it gives an error. Knowledgeable people help, what is the cause of the error.
parser.py code

import xml.dom.minidom


dom1 = xml.dom.minidom.parse('catalogue.xml')

product_id = dom1.getElementsByTagName('product_id')
code = dom1.getElementsByTagName('code')
name = dom1.getElementsByTagName('name')
product_size = dom1.getElementsByTagName('product_size')
matherial = dom1.getElementsByTagName('matherial')
small_image = dom1.getElementsByTagName('small_image')
big_image = dom1.getElementsByTagName('big_image')
content = dom1.getElementsByTagName('content')
state = dom1.getElementsByTagName('state')
currency = dom1.getElementsByTagName('currency')
status = dom1.getElementsByTagName('status')
brand = dom1.getElementsByTagName('brand')
weight = dom1.getElementsByTagName('weight')


product_id_t = open('product_id.txt', 'w')
for e in product_id:
    for t in e.childNodes:
        product_id_t.write(t.data + '\n')
product_id_t.close() 
        
code_t = open('code.txt', 'w')
for e in code:
    for t in e.childNodes:
        code_t.write(t.data + '\n')
code_t.close() 
        
name_t = open('name.txt', 'w')
for e in name:
    for t in e.childNodes:
         name_t.write(str(t.data + '\n'))
name_t.close() 
        
product_size_t = open('product_size.txt', 'w')
for e in product_size:
    for t in e.childNodes:
        product_size_t.write(t.data + '\n')
product_size_t.close() 
        
matherial_t = open('matherial.txt', 'w')
for e in matherial:
    for t in e.childNodes:
        matherial_t.write(t.data + '\n')
matherial_t.close()

small_image_t = open('small_image.txt', 'w')
for e in small_image:
    for t in e.childNodes:
        small_image_t.write(t.data + '\n')
small_image_t.close()

big_image_t = open('big_image.txt', 'w')
for e in big_image:
    for t in e.childNodes:
        big_image_t.write(t.data + '\n')
big_image_t.close()

content_t = open('content.txt', 'w')
for e in content:
    for t in e.childNodes:
        content_t.write(t.data + '\n')
content_t.close()

state_t = open('state.txt', 'w')
for e in state:
    for t in e.childNodes:
        state_t.write(t.data + '\n')
state_t.close()

currency_t = open('currency.txt', 'w')
for e in currency:
    for t in e.childNodes:
        currency_t.write(t.data + '\n')
currency_t.close()

status_t = open('status.txt', 'w')
for e in status:
    for t in e.childNodes:
        status_t.write(t.data + '\n')
status_t.close()

brand_t = open('brand.txt', 'w')
for e in brand:
    for t in e.childNodes:
        brand_t.write(t.data + '\n')
brand_t.close()

weight_t = open('weight.txt', 'w')
for e in weight:
    for t in e.childNodes:
        weight_t.write(t.data + '\n')
weight_t.close()

Error:
Traceback (most recent call last):
File "D:\incoming\WorkSpace\Eclipse\Parser\sten-parser.py", line 37, in
name_t.write(str(t.data + '\n'))
File "D:\Instal\Program\Python3\lib\encodings\cp1251.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u044e' in position 6: character maps to

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

K

KlonKaktusa, 2013-11-29
@KlonKaktusa

You can just override the built-in function : )

import codecs
def open(path, mode):
  return codecs.open(path, mode, 'utf-8')

Soup is usually used for parsing:
www.crummy.com/software/BeautifulSoup/bs4/doc
Files are usually opened like this:

with open("filename", "w") as f:
  f.write(something)

docs.python.org/3/tutorial/inputoutput.html#methods-of-file-objects