Answer the question
In order to leave comments, you need to log in
How to count the number of repeated letters (segments) in a set of words?
Good evening everyone.
There is a non-trivial task - you need to select from an array of words those that have a repeating beginning, i.e. count the words that have the highest probability of semantic similarity. For example, "freeway" and "car" will appear in the final search results. This can be done with several nested loops (loops generally replace almost any algorithm), but the beauty and speed of such a solution is in great doubt ...
How would you try to implement something like that? .. How can this be implemented at all (maybe there are known algorithms) ? I would be grateful for advice, thanks.
PS libraries (like phpMorphy) are possible but not desirable
Answer the question
In order to leave comments, you need to log in
if the array of words is large, I suggest creating oriented trees, the node of which will be a letter, the vertex is the first letter of the word, the second level will be the second letters, etc. to the end of all words. And the number of similarities can be determined by the number of nodes, the level of similarity - by the level of the node. Example:
Words Freeway, Car Aviation
Count:
А - В - Т - О - С - Т - Р - А - Д - А
| |
И М - О - Й - К - А
| |
А О
| |
Ц Б
| |
И И
| |
Я Л
|
Ь
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question