K
K
Kai2022-02-02 17:45:17
Python
Kai, 2022-02-02 17:45:17

How to ignore characters when reading csv file that are not uft-8 encoded?

Good afternoon!

I have a csv file that has been updated every day for several months now, which has a column of text. I usually read this:

with open ('groups.csv', 'r', encoding='utf-8') as f:
    df = pd.read_csv(f, sep=';', index_col = False, encoding='utf-8')


The file is being overwritten and up to this point everything was ok. Neither the file nor the script was touched, but today the reading started df = pd.read_csv to work with an error.

It swears at characters that it cannot recode (strings with such characters are already a long time ago).
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd1 in position 161660: invalid continuation byte
In Excel they are viewed like this:
61fa9563df93b867861616.png

Through Notepad they are seen "xD0", "xD1" Please tell me

who knows how such characters are in a string can be cleaned (not lines with them, but these characters themselves) or how to correctly calculate csv in this case?
The file itself is utf-8 encoded, I read it also utf-8.

Please do not throw off the article, everything has already been covered

Answer the question

In order to leave comments, you need to log in

2 answer(s)
T
tigervvin, 2022-02-03
@Tr3ShKirill

Read the documentation about replace, this method replaces characters as you need, for example, there is a string '98 356', we cannot convert it to int, since there is a space in the string

a = '98 356'
a = replace(' ', '')
print(int(a))
Ответ: 98356

Those. to this method we pass the character we want to replace and the character we are changing to
replace('the character we want to replace', 'the character we are changing to')

K
Kai, 2022-02-02
@Tr3ShKirill

So far, I just cleaned out such characters through BI
and everything was decided. But in general, they may appear in the future and I would still like to avoid them.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question