V
V
vitalykhy2018-04-29 17:35:17
Machine learning
vitalykhy, 2018-04-29 17:35:17

Fasttext and "similarity" of strings. How to find?

There is a nice library from facebook called fasttext. I just started learning and I need help.
Now I don’t quite understand how you can find a similar news by line (let’s say the title of the news, as in Yandex).
If I understand correctly, then this comparison is searched for by vectors. That is, based on the model (here it is also not entirely clear how to compose the training of such a model based on fasttext), we get a vector for the sentence (news title).
Next, we take new news and build a vector based on it. For both vectors, we find the cosine and get the final similarity ratio. But in this case, we get that we have millions of news in the database, and it turns out that we must build such a vector for each news, and where the value falls into a certain percentage - should we attribute it there?
In general, I need help in understanding:
1) how to train such a model based on fasttext. In the official documentation, I did not understand the principle of compiling data for training through label. Indeed, in this case, I will have a lot of labels, since there can also be a lot of news. How to add new news to the model? And does it need to be done?
2) how to make comparison. A description of the algorithm will suffice here. If there is an opportunity to back up with a formula or a fictitious example, I will be very grateful.
Also I would be grateful if you tell me where to read. There is no need to delve into the wilds, since the "magic" implemented in the library is for self-development. I have a small understanding of the basics. And the essence of the library is to hide complex calculations from the user. That is, I would like to receive approximately such information: here is an article - it tells how to obtain such and such results based on such and such information.
I really look forward to your directions (but not to Google).

Answer the question

In order to leave comments, you need to log in

1 answer(s)
X
xmoonlight, 2018-04-29
@xmoonlight

NetworkTopology-FullyConnected.png
We run the full-mesh of the required lines through the Stumper API - Compare with the entry of each pair to the database and there are no problems.
Then, we locally build the desired graph from any node to any.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question