Y
Y
yurka2013-07-14 00:22:16
Data request
yurka, 2013-07-14 00:22:16

Is there a publicly available database of the most used nouns and popular product names?

Good people, if anyone knows tell me. We need a database of the most used nouns and popular product names in Russian and English on bulletin boards. We want to improve auto-suggestions in the search for ads in the database.
Is this available somewhere in the public domain?

Answer the question

In order to leave comments, you need to log in

3 answer(s)
@
@ntkt, 2013-07-14
_

You should look in the direction of what linguists call corpus, and the results of its machine processing.
For an experiment, you can start with www.artint.ru/projects/frqlist.php , there are lists of words, with affixed parts of speech and ordered by frequency, for example - www.artint.ru/projects/frqlist/lemma.num.zip

The word list available from this page contains approximately 35,000 words with a frequency greater than 1 ipm (occurrences per million words, instances per million words). There is also a shorter list of the 5000 most frequent Russian words. The lists use the Windows-1251 Cyrillic encoding and are packed with the WinZip utility (Linux or Mac users can use StuffIt to unpack).
The structure of the lists follows the format of the lemmatized lists from the British National Corpus (BNC) created by Adam Kilgarif, namely:
ordinal, frequency (ipm), lemma, part of speech (BNC classification).

O
olgab, 2013-07-14
@olgab

If there are tags in ads, they can be used as auto-suggestions. Wash the easiest option.

P
Puma Thailand, 2013-07-14
@opium

Well, just index your ads and use the words from them.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question