Answer the question
In order to leave comments, you need to log in
How to use the re.split method?
Let's say I want to extract the most frequent words from the text, I split the text by characters using re.split:
And I also want to add the character " , but if I add it, the quotes will close syntactically prematurely. How to add it?
Another small question: where dig, so that the division goes on all characters, except for uppercase-lowercase letters?I remember that it seems like the syntax is like: re.split("[^[az][AZ]"), what is this construction called? words = re.split("[ \n.,?!:;']", corpus)
Answer the question
In order to leave comments, you need to log in
First, you can escape the double quote character in the same way you escape the newline character - "[ \n.,?!:;'\"]"
. Secondly, you can do it easier and faster:
from collections import defaultdict, Counter
import string
punctuation_map = dict((ord(char), None) for char in string.punctuation)
prepositions = ['и', 'в', 'без', 'до', 'из', 'к', 'на', 'по', 'о', 'от', 'перед', 'при', 'через', 'с', 'у', 'за', 'над', 'об', 'под', 'про', 'для']
with open('WarAndPeace.txt', encoding='utf-8') as fh:
text = fh.read()
clean_data = text.translate(punctuation_map)
words = Counter(word.strip().lower() for word in clean_data.split() if word not in prepositions)
print(words.most_common(1))
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question