Answer the question
In order to leave comments, you need to log in
Rate a piece of code
Forgive me for making such a request, but I have not yet found more or less live Russian-language resources on python. I just started studying it and just then I came across the article virtustilus - We bring Russian texts on Mac OS X into one encoding with a Python script . It seemed to me that the code was too twisted and I tried to rewrite it in my own way. And I would like to hear about the mistakes I made. The code is simple and small:
#!/usr/bin/python
# -*- coding: utf-8 -*-
import os
import sys
import chardet
def converter(path, file):
if file.endswith(".txt"):
P = os.path.join(path, file)
with open(P, "r") as F:
text = F.read()
enc = chardet.detect(text).get("encoding")
if enc and enc.lower() != u"utf-8":
try:
text = text.decode(enc)
text = text.encode("utf-8")
with open(P, "w") as f:
f.write(text)
print P+u" сконвертирован."
except:
print "Ошибка в имени файла: название содержит русские символы или пробелы."
print u"-------------------------------"
path = raw_input(u"Input path or file:")
if os.path.isdir(path) == True:
for (path, dirs, files) in os.walk(path):
for file in files:
converter(path, file)
elif os.path.isfile(path) == True:
converter(os.path.dirname(path), os.path.basename(path))
sys.exit(0)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)
Answer the question
In order to leave comments, you need to log in
0. Specify the python version.
1. Find out what the exception is throwing except Exception as exception
and go in this direction, but I just can’t find the file on the input line and skipping os.path.isdir and os.path.isfile exits (you don’t have a check that you can enter an incorrect file name).
2. It's strange, it works for me on Windows, so I won't tell you.
3. You need to go through all the extensions and check each one, for example, if you list extensions separated by commas.
any([file_path.endswith(extension) for extension in '.rar,.txt'.split(',')])
u"%s сконвертирован." % file_path
instead of file_path + u" сконвертирован."
. if __name__ == '__main__':
print u"-------------------------------"
, you can write print u"-" * 20
'b'
to the command open
: open(file_path, 'rb')
andopen(file_path, 'wb')
Comments on tbicr 's answer :
>>0. Specify the python version.
If we are talking about the separation of Python2 / 3, then how can this code even run on Python3? In the same place, in principle, it is impossible to specify a u-prefix for strings, because it comes by default.
>>1. Find out what kind of exception is thrown except Exception as exception
Exceptions during string conversion are the common parent for them in this case UnicodeError.
Comments on TS:
1) You write u"utf-8", but "Error in file name..." without u-prefix. Better vice versa :)
2) The smaller the nesting, the better (within reasonable limits). Instead of
if file.endswith(".txt"):
P = os.path.join(path, file)
...
if not file.endswith(".txt"):
continue
P = os.path.join(path, file)
...
Flat is better than nested.
Flat is better than nested.
for (path, dirs, files) in os.walk(path):
for file in files:
converter(path, file)
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question