Answer the question
In order to leave comments, you need to log in
Algorithmic question from a future C# .NET junior. Where to start research?
Greetings friends.
I'm learning C# from Andrew Troelsen's book. At the moment, I have reached the level of a complete understanding of what the author writes about. So I feel strong enough to start writing my first C# program.
I came up with the following task-research:
Explore 100 English books grouped into 10 different topics.
Task
Determine the most frequently used words:
Answer the question
In order to leave comments, you need to log in
The sequence of actions is as follows:
1) splitting the text into lexical units (in your case, meaningful units are words). It is convenient to get an IEnumerable as an output, representing a lazy iterator over the words in the text.
2) bringing the word to normal form, i.e. to lowercase and, optionally, to the general word form (for example, for nouns - the first case, singular, etc.)
3) adding the word to the Dictionary, where the key is the word itself, and the value is the counter:
int count;
dictionary.TryGetValue(word, out count);
dictionary[word] = count + 1;
Why single words? Do you think follow and followership are the same words? Then you can take something ready from fuzzy matching, for example. Or, if you want to use algorithms, you can implement yourself finding the Levenshtein distance, or something similar simple.
But, as for me, it would be more logical to first build on the immediate words (in book X, the word follow is used 123,234 times)
And so on for each word. And only then, based on these data, come up with a new problem. For example, find the most commonly used root.
Respect to the author. I myself thought to implement something similar, I think such a program will greatly help those people who do not have very good English. For example, you want to read some book, but the vocabulary is still small, you learn the most unfamiliar most used words (for ease of learning, for example, you can add them to the lingualeo.com dictionary), and read ahead =) Another question for everyone, is there a ready-made solution to this problem ? Thanks in advance!
Since you're looking specifically for words, it's not exactly a substring search within a string. You won't need the rest of the word if it doesn't already match. And some characters, such as spaces and punctuation marks, do not participate in your comparison.
But in general, you can google search for a substring in a string, there are many algorithms. For example, the Knuth-Morris-Pratt algorithm, or the Boyer-Moore algorithm.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question