Answer the question
In order to leave comments, you need to log in
How to find the most frequent words in a book?
There is a book in English. You need to find out what are the most frequent words in this book.
As a result, I want to get something similar to:
1. "The" - 5943 times
2. "Is" - 4311 times, etc.
Answer the question
In order to leave comments, you need to log in
1) Copy the text in Word
2) Replace all spaces with line breaks; all commas, periods and other signs to nothing; all words to lowercase
3) Copy the resulting column to Excel
4) Calculate the frequency of repetition of values in the column using Excel - there are a lot of examples in Yandex
5) Copy the dictionary and frequencies with a special paste ("values") to another sheet, sort in descending frequency
Write a program, make a dictionary in it, if the word is already in it, add +1 to it, if not, then add it to the dictionary, you can probably write it in an hour with debugging, and the program will parse any book in a few seconds.
In a simple way, something like this:
#!/usr/bin/perl
my %word;
while (<>) {
$word{lc($_)}++ foreach split /\W+/;
}
print "$_ : $word{$_}\n"
foreach reverse sort { $word{$a} <=> $word{$b} } keys %word;
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question