Answer the question
In order to leave comments, you need to log in
Need to get the first N most occurring words in a text file?
Comrades. The task is this. There is a text file. You need to get the first N most frequently repeated words (in descending order of frequency of occurrence). The comparison is case-insensitive. And you need to make a stop dictionary! Store the dictionary in a file.... Word separators are spaces, tabs, newlines, punctuation
You can use anything (STL)! Comrades tell me :D
Answer the question
In order to leave comments, you need to log in
There is a stupid (on the forehead), but slow option - sorting and counting the number of repetitions with the compilation of a kind of pseudo-tree. There is a difficult option a little faster - trees. You can also consider a lot of faster and much more complex options.
What to suggest? What's the question?
Or write the code for you?
The algorithm is simple:
We read the words from the file stream and collect them in a map like this
ifstream fs("filename.txt");
map freq; // file frequency
string word;
while(read_next_word(fs, word)) // read and skip spaces tabs etc... (the logic for skipping unnecessary characters here)
{
transform(word.begin(), word.end(), word.begin(), tolower) ; //lowercase
freq[word]++; // increment the counter for our word
}
now we have the frequencies of all the words in the map, copy it into a vector and sort by frequency
vector > vocabulary(freq.begin(), freq.end());
sort(vocabulary.begin(), vocabulary.end(), less_second); // using a lambda would be easier if possible c++11
The words in the vocabulary container are sorted by frequency and you can do anything with them
where
bool less_second(const pair& a, const pair& b)
{
return a.second < b. second;
}
This is really all the code. (except for the logic of skipping characters, but everything is simple in my opinion)
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question