T
T
TimLee2014-03-15 05:27:32
Books
TimLee, 2014-03-15 05:27:32

How to find the most frequent words in a book?

There is a book in English. You need to find out what are the most frequent words in this book.
As a result, I want to get something similar to:
1. "The" - 5943 times
2. "Is" - 4311 times, etc.

Answer the question

In order to leave comments, you need to log in

4 answer(s)
A
Andrew, 2014-03-15
@TimLee

1) Copy the text in Word
2) Replace all spaces with line breaks; all commas, periods and other signs to nothing; all words to lowercase
3) Copy the resulting column to Excel
4) Calculate the frequency of repetition of values ​​in the column using Excel - there are a lot of examples in Yandex
5) Copy the dictionary and frequencies with a special paste ("values") to another sheet, sort in descending frequency

T
TimLee, 2014-03-16
@TimLee

Still, in second place was "and"

P
Puma Thailand, 2014-03-15
@opium

Write a program, make a dictionary in it, if the word is already in it, add +1 to it, if not, then add it to the dictionary, you can probably write it in an hour with debugging, and the program will parse any book in a few seconds.

R
raskumandrin, 2014-03-15
@raskumandrin

In a simple way, something like this:

#!/usr/bin/perl
my %word;
while (<>) {
    $word{lc($_)}++ foreach split /\W+/;
}
print "$_ : $word{$_}\n"
    foreach reverse sort { $word{$a} <=> $word{$b} } keys %word;

And so as not to write anything - send the file by mail :)

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question