A
A
Alexander Krasnov2015-05-26 15:07:11
Python
Alexander Krasnov, 2015-05-26 15:07:11

I read Cyrillic from a CSV file, output to files and get xd0x94xd0xbexd0xb1xd0xb0xd0xb2xd0xb8xd1x82xd1x8, how to read and output Cyrillic?

Good time everyone!
I read Cyrillic from a CSV file, output to files and get xd0x94xd0xbexd0xb1xd0xb0xd0xb2xd0xb8xd1x82xd1x8, how to read and output Cyrillic?
Task:
csv input file - take its header and make keys.
Each new line is a value.
The output is a list with dictionaries.
Next, the list must be output to a file, and in those places where Cyrillic is used in CSV - it turns out xd0x94xd0xbexd0xb1xd0xb0xd0xb2xd0xb8xd1x82xd1x8.
Code (without Cyrillic everything works as it should):

# -*- coding: utf-8 -*-
import csv, codecs
import re
def readCSV(filename):
    csvfile = open(filename, 'r')  #open file for read
    spamreader = csv.reader(csvfile) 
    dictFromCSV = dict()
    i = 0
    j = 0
    listFroDictWithValueFromCSV = list()
    for row in spamreader: 
        if ';;;' in row[0]: #?????? ?? ?????? ????? - ??? ????????? ??????. ?????? ?????? ???????? ?????? ???????????. ?? 10 ?? 14 ????.
            pass
        elif i == 0: #?????????? ????? ??????? - ??? ????? ????? ?????? ??????? ??????????? ??????? ?????? ??? ??????
            keysForDict = str(row).split(';')
            keysForDict = str(keysForDict)
            keysForDict = re.sub(r'[^\w\s]+|[\d]+', r'', keysForDict).strip()
            keysForDict = str(keysForDict).split(" ")
            lenght = len(keysForDict)
            i = i + 1 
            #print keysForDict
        else: #????? ???????? ??????? ???????????? ???????? ? ?????????? ??????? ?????(????) ???????? ? ??????? ?? ???? ?????.
            row = str(row).split(";")  
            row = str(row)
            row = re.sub(r'[^\w\s]+', r'', row).strip()
            #print row
            #print ("######################")
            row = str(row).split(" ")
            #print row
            for i in range(0,lenght): #counts row in csv-file
                dictFromCSV[keysForDict[j]] = str(row[i])
                j = j + 1
            i = i + 1     
            listFroDictWithValueFromCSV.append(dictFromCSV.copy()) #?????????? ??? ??????? ? ???? ??????
            j = 0
    #print keysForDict
    return listFroDictWithValueFromCSV;

I will also be glad to any criticism / advice on how to make the code more productive / readable.
PS the list is output to a file in another function.
PSS English-speaking Windows has pumped over the comments that were written in Russian.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
L
lPolar, 2015-05-26
@Pompeius_Magnus

Or you can do it quite elegantly (Py3):

import pandas as pd
fname = r'C:\folder\myfile.csv'
data = pd.read_csv(fname,sep='\t',encoding='cp1251')
print(data)

R
Romarioagors, 2019-10-15
@Romarioagors

read from Pandas CSV, get - Мега
df = pd.read_csv(path, delimiter=";",low_memory=False , encoding = 'CP866' )
already went through all encodings :
# cp1251
# 'IBM866'
# windows - 1251
# utf -8'
# encoding='ANSI'
# 'ISO-8859-1'
# cp866 ,DOS-720
# CP866
# CP437
# KOI8-U
# KOI8-R
# KOI-7
Nothing helps, any advice?

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question