D
D
DVoropaev2020-02-24 19:57:05
linux
DVoropaev, 2020-02-24 19:57:05

How to defeat UnicodeDecodeError when reading a file in pythone?

There is a 1.2 GB text file, there is a python script that reads it line by line.

logFile = open(sys.argv[1], 'r')
count = 0;
for log in logFile:	
  print(count) #номер обрабатываемой строки.
  count += 1
  ...

But when reading the file on line 36934, the following error occurs:
File "./parcer.py", line 75, in <module>
    for log in logFile:
  File "/usr/lib/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc8 in position 867: invalid continuation byte

How to fix it?
The file is large, and when you try to open the file with any text editor, it freezes.
I work under linux

Answer the question

In order to leave comments, you need to log in

2 answer(s)
D
Dimonchik, 2020-02-24
@DVoropaev

with open(sys.argv[1], 'rb') as f:
    for n, L in enumerate(f):
        try:
            print(n, L.decode('utf8', 'ignore'))
        except Exception as e:
            print(n, 'vot blyad', e)
            with open('holy_shit.csv', 'ab') as w:
                w.write(L)
            continue

and this is the editor, one of the 2-3 that CAN

S
Sergey Karbivnichy, 2020-02-24
@hottabxp

Midnight commander to help you, opens the unopenable.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question