M
M
mavar2018-10-15 17:52:32
Python
mavar, 2018-10-15 17:52:32

How to get a vector from a Cyrillic word?

We really need an example of obtaining a vector of the same size for Russian words.
For example, I have the words of cities:

  • Peter
  • Nizhny Novgorod
  • Ufa
  • Vladivostok
  • etc.

You need to get a normalized vector from each word. I will feed this vector to the input of the neural network.
Please provide a python code example if possible.

Answer the question

In order to leave comments, you need to log in

3 answer(s)
D
Danil, 2018-10-15
@mavar

Word2vec is here to help. Played around with the vector representation of the words for the collection of Harry Potter books.

I
ivodopyanov, 2018-10-16
@ivodopyanov

Specifically for named entities such as cities or full names, it is better to use a dictionary of these entities, when found in the text, replace it with some token (% city%, % first name%, % last name%), and then work with this token. Because for the logic of the model, it almost certainly does not matter which city was named; what matters is whether it was named at all or not.
The easiest way to get a vector representation of a word is to simply first create a dictionary of used words in the dataset, and then replace the word with its id in this dictionary or with a one-hot representation.
Smarter options are word embeddings, when the id of the word corresponds to some vector obtained in advance or trained during the model. There are also options for encoding pairs / triplets of letters in a word.

X
xmoonlight, 2018-10-16
@xmoonlight

You make up the alphabet, then for each letter you take a percentage position in the alphabet, assuming that the entire alphabet is 1.0.
26+33 = 59 (eng+rus => total alphabet)
1/59 = 1*0.016949 (first letter of the alphabet)
2*0.016949=0.033898 (second letter of the alphabet)
59/59 = 1.0 (last letter alphabet)
And write them into an array one after another in the order in which they appear in the word.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question