Answer the question
In order to leave comments, you need to log in
Is there a publicly available database of the most used nouns and popular product names?
Good people, if anyone knows tell me. We need a database of the most used nouns and popular product names in Russian and English on bulletin boards. We want to improve auto-suggestions in the search for ads in the database.
Is this available somewhere in the public domain?
Answer the question
In order to leave comments, you need to log in
You should look in the direction of what linguists call corpus, and the results of its machine processing.
For an experiment, you can start with www.artint.ru/projects/frqlist.php , there are lists of words, with affixed parts of speech and ordered by frequency, for example - www.artint.ru/projects/frqlist/lemma.num.zip
The word list available from this page contains approximately 35,000 words with a frequency greater than 1 ipm (occurrences per million words, instances per million words). There is also a shorter list of the 5000 most frequent Russian words. The lists use the Windows-1251 Cyrillic encoding and are packed with the WinZip utility (Linux or Mac users can use StuffIt to unpack).
The structure of the lists follows the format of the lemmatized lists from the British National Corpus (BNC) created by Adam Kilgarif, namely:
ordinal, frequency (ipm), lemma, part of speech (BNC classification).
If there are tags in ads, they can be used as auto-suggestions. Wash the easiest option.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question