Answer the question
In order to leave comments, you need to log in
Why does the find/xargs link only work on one file?
Good afternoon!
After some time reading fb2 books from one translator, I selected a number of regexp to correct some points. It takes a long time to use them one by one and even for each remaining file, so I wrote a simple Python script (actually, right after I learned some basics).
Here is what came out:
- csetfix.sh
enconv -x UTF-8 $1
python bfix.py $1
# -*- coding: utf-8 -*-
import re
import sys
if len(sys.argv) < 2:
exit(1)
reg = [
{'find':r'windows-1251', 'replace':r'utf-8'},
{'find':r'-нить', 'replace':r'-нибудь'},
{'find':r'кол-в', 'replace':r'количеств'},
{'find':r'-же', 'replace':r' же'},
{'find':r'так же', 'replace':r'также'},
{'find':r'Однако,', 'replace':r'Однако'},
{'find':r'Высочество', 'replace':r'Величество'},
{'find':r'Глава №?(\d+)\.? ', 'replace':r'Глава \1: '},
{'find':r'(?<=<body>)\s+<title>[\s\S]+?</title>', 'replace':r''},
{'find':r'– -{30,43},', 'replace':r' ================================ '},
{'find':r'<p>\s+', 'replace':r'<p>'},
{'find':r'\s+</p>', 'replace':r'</p>'},
{'find':r'<emphasis>\s+', 'replace':r'<emphasis>'},
{'find':r'\s+</emphasis>', 'replace':r'</emphasis>'},
{'find':r'</emphasis>\s*<emphasis>', 'replace':r' '},
{'find':r'<strong>\s+', 'replace':r'<strong>'},
{'find':r'\s+</strong>', 'replace':r'</strong>'},
{'find':r'</strong>\s*<strong>', 'replace':r' '},
{'find':r'<strong></strong>', 'replace':r''},
{'find':r'<emphasis></emphasis>', 'replace':r''},
{'find':r'<p></p>', 'replace':r''},
{'find':r'( –)(?= (?:(?:потому )?что(?:бы)?|если|то|а|да|и|или|однако|но) )', 'replace':r','},
{'find':r'\.</strong>', 'replace':r'</strong>'},
{'find':r':</strong>([^\s<])', 'replace':r':</strong> \1'},
{'find':r'(= </p>\s+<p><strong>)– ', 'replace':r'\1'},
{'find':r'\s+(\.|,|!|\?)', 'replace':r'\1'},
{'find':r'\n+', 'replace':r'\n'}
]
file = open(sys.argv[1], "r")
result = ""
for line in file.readlines():
result = result+line
file.close()
for i in range(0,len(reg)):
result = re.sub(reg[i]['find'], reg[i]['replace'], result)
result = re.split("\n+", result)
file2 = open(sys.argv[1], "w")
for line in result:
file2.write(line+'\n')
file2.close()
find -type f -name "*.fb2" | sort | xargs python bfix.py
Answer the question
In order to leave comments, you need to log in
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question