I
I
ikudza2014-05-13 18:58:19
Python
ikudza, 2014-05-13 18:58:19

Parsing with lxml and saving data with pandas

Inspired by an article on Habré, I'm trying to write a parser. Code below:

import lxml.html as html
from pandas import DataFrame

main_domain = 'http://market.yandex.ru'
brand_list = html.parse('%s/brands-list.xml' % (main_domain))

e = brand_list.getroot().find_class('body')
for i in e:
    t = i.getchildren().pop()
    link_table = DataFrame({'EV':j[0].text , 'LINK':j[2]} for j in t.iterlinks())

link_table.to_csv('brands1.csv',';',index=False,encoding="UTF-8")

I get an error UnicodeDecodeError: 'utf8' codec can't decode byte 0xd0 in position 4: unexpected end of data
What am I doing wrong?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
A
Arseny Kravchenko, 2014-05-22
@Arseny_Info

for i in e:
    t = i.getchildren().pop()
    link_table = DataFrame({'EV':j[0].text.encode('utf-8') , 'LINK':j[2]} for j in t.iterlinks())

F
Freesty1er, 2014-05-24
@Freesty1er

What article inspired you?

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question