Answer the question
In order to leave comments, you need to log in
I read Cyrillic from a CSV file, output to files and get xd0x94xd0xbexd0xb1xd0xb0xd0xb2xd0xb8xd1x82xd1x8, how to read and output Cyrillic?
Good time everyone!
I read Cyrillic from a CSV file, output to files and get xd0x94xd0xbexd0xb1xd0xb0xd0xb2xd0xb8xd1x82xd1x8, how to read and output Cyrillic?
Task:
csv input file - take its header and make keys.
Each new line is a value.
The output is a list with dictionaries.
Next, the list must be output to a file, and in those places where Cyrillic is used in CSV - it turns out xd0x94xd0xbexd0xb1xd0xb0xd0xb2xd0xb8xd1x82xd1x8.
Code (without Cyrillic everything works as it should):
# -*- coding: utf-8 -*-
import csv, codecs
import re
def readCSV(filename):
csvfile = open(filename, 'r') #open file for read
spamreader = csv.reader(csvfile)
dictFromCSV = dict()
i = 0
j = 0
listFroDictWithValueFromCSV = list()
for row in spamreader:
if ';;;' in row[0]: #?????? ?? ?????? ????? - ??? ????????? ??????. ?????? ?????? ???????? ?????? ???????????. ?? 10 ?? 14 ????.
pass
elif i == 0: #?????????? ????? ??????? - ??? ????? ????? ?????? ??????? ??????????? ??????? ?????? ??? ??????
keysForDict = str(row).split(';')
keysForDict = str(keysForDict)
keysForDict = re.sub(r'[^\w\s]+|[\d]+', r'', keysForDict).strip()
keysForDict = str(keysForDict).split(" ")
lenght = len(keysForDict)
i = i + 1
#print keysForDict
else: #????? ???????? ??????? ???????????? ???????? ? ?????????? ??????? ?????(????) ???????? ? ??????? ?? ???? ?????.
row = str(row).split(";")
row = str(row)
row = re.sub(r'[^\w\s]+', r'', row).strip()
#print row
#print ("######################")
row = str(row).split(" ")
#print row
for i in range(0,lenght): #counts row in csv-file
dictFromCSV[keysForDict[j]] = str(row[i])
j = j + 1
i = i + 1
listFroDictWithValueFromCSV.append(dictFromCSV.copy()) #?????????? ??? ??????? ? ???? ??????
j = 0
#print keysForDict
return listFroDictWithValueFromCSV;
Answer the question
In order to leave comments, you need to log in
Or you can do it quite elegantly (Py3):
import pandas as pd
fname = r'C:\folder\myfile.csv'
data = pd.read_csv(fname,sep='\t',encoding='cp1251')
print(data)
read from Pandas CSV, get - Мега
df = pd.read_csv(path, delimiter=";",low_memory=False , encoding = 'CP866' )
already went through all encodings :
# cp1251
# 'IBM866'
# windows - 1251
# utf -8'
# encoding='ANSI'
# 'ISO-8859-1'
# cp866 ,DOS-720
# CP866
# CP437
# KOI8-U
# KOI8-R
# KOI-7
Nothing helps, any advice?
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question