I
I
Injustive2021-08-09 19:44:28
Computational linguistics
Injustive, 2021-08-09 19:44:28

How to categorize a set of words?

I have a set of words (more than 5k words) and a few categories (15-20). How can I automatically determine the category for each word? For example, there is the word fish, cat - Animals or laptop, telephone - Devices. Already a lot of things dug on the Internet. I settled on the FastText, word2vec options. I understand correctly that I need to download the vector English. words. Then take the word and iterate through the categories. Where there will be a greater percentage of correspondence between a word and a category, should I choose that category for this word?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
I
imageman, 2021-08-11
@Injustive

I understand that you have your own categories, not the same as those of the authors of neural networks?
I think (may be wrong) you should type keywords for each category. Using these keywords, we get a vector on the pre-trained model (remember the arithmetic mean vector?). It is possible that some categories will have to be divided into subcategories (for more homogeneous vectors of keywords).
For the search word, we also obtain a vector, then calculate the distance (Euclidean, cosine, etc.). With which category the distance is minimal - then ours.
fastText has the ability to classify text or translate words into a vector.
https://gosha20777.github.io/tutorial/2018/04/12/f...seems to be quite reasonable there. As I see it, you need to learn the classifier yourself (as I understand it, you want to use something pre-trained?). Well, read more https://sysblok.ru/nlp/kak-rabotaet-fasttext-i-gde...
And if nothing happens, then we are looking for the author https://habr.com/ru/post/489474/

Similar questions

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question