E
E
Evgeny Elizarov2012-10-01 12:25:13
Python
Evgeny Elizarov, 2012-10-01 12:25:13

Rate a piece of code

Forgive me for making such a request, but I have not yet found more or less live Russian-language resources on python. I just started studying it and just then I came across the article virtustilus - We bring Russian texts on Mac OS X into one encoding with a Python script . It seemed to me that the code was too twisted and I tried to rewrite it in my own way. And I would like to hear about the mistakes I made. The code is simple and small:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import os
import sys
import chardet

def converter(path, file):
  if file.endswith(".txt"):
    P = os.path.join(path, file)
    with open(P, "r") as F:
      text = F.read()
      enc = chardet.detect(text).get("encoding")
      if enc and enc.lower() != u"utf-8":
        try:
          text = text.decode(enc)
          text = text.encode("utf-8")
          with open(P, "w") as f:
            f.write(text)
            print P+u" сконвертирован."
        except:
          print "Ошибка в имени файла: название содержит русские символы или пробелы."
        print u"-------------------------------"
        
path = raw_input(u"Input path or file:")
if os.path.isdir(path) == True:
  for (path, dirs, files) in os.walk(path):
    for file in files:
      converter(path, file)
elif os.path.isfile(path) == True:
  converter(os.path.dirname(path), os.path.basename(path))
        
sys.exit(0)

Along the way, I have a few questions:
1. What if the file contains Russian letters or spaces? For now, they just fall under the exception and nothing is done with them, but maybe there are ways? Google didn't help much in answering this question &
2. How to correctly insert Russian characters into the "greetings" string in raw_input? While he is cursing
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)

3. How can I specify multiple file extensions with .endswith()?

Thanks in advance for your help. If the questions are really childish - you'll forgive me, I'm still not very familiar with the Python documentation, I've only known him for a couple of weeks. :-[

Answer the question

In order to leave comments, you need to log in

2 answer(s)
P
Pavel Tyslyatsky, 2012-10-01
@tbicr

0. Specify the python version.
1. Find out what the exception is throwing except Exception as exceptionand go in this direction, but I just can’t find the file on the input line and skipping os.path.isdir and os.path.isfile exits (you don’t have a check that you can enter an incorrect file name).
2. It's strange, it works for me on Windows, so I won't tell you.
3. You need to go through all the extensions and check each one, for example, if you list extensions separated by commas.

any([file_path.endswith(extension) for extension in '.rar,.txt'.split(',')])

4. Read pep8 , give meaningful names to variables, use u"%s сконвертирован." % file_pathinstead of file_path + u" сконвертирован.".
5. Use if __name__ == '__main__':
6. It’s better to pass file_path to the converter function right away
7. Perhaps you don’t need to use raw_input, but pass parameters as arguments (sys.argv or argparser will help here )
8. Instead print u"-------------------------------", you can write print u"-" * 20
9. It’s better to add the attribute 'b'to the command open: open(file_path, 'rb')andopen(file_path, 'wb')

A
Alexey Akulovich, 2012-10-01
@AterCattus

Comments on tbicr 's answer :
>>0. Specify the python version.
If we are talking about the separation of Python2 / 3, then how can this code even run on Python3? In the same place, in principle, it is impossible to specify a u-prefix for strings, because it comes by default.
>>1. Find out what kind of exception is thrown except Exception as exception
Exceptions during string conversion are the common parent for them in this case UnicodeError.
Comments on TS:
1) You write u"utf-8", but "Error in file name..." without u-prefix. Better vice versa :)
2) The smaller the nesting, the better (within reasonable limits). Instead of

    if file.endswith(".txt"):
        P = os.path.join(path, file)
        ...

it's better to write:
    if not file.endswith(".txt"):
        continue
    P = os.path.join(path, file)
    ...

Flat is better than nested.
Flat is better than nested.

3) if os.path.isdir(path) == True:
don't do that. just
if os.path.isdir(path):
4) instead
    for (path, dirs, files) in os.walk(path):
        for file in files:
            converter(path, file)

you can use glob .

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question