N
N
Narrator692016-03-28 21:19:48
Algorithms
Narrator69, 2016-03-28 21:19:48

Implementation of the search for sentences similar in meaning?

There was a need to attribute the product to one category or another, you can rely on its name.
There is a certain base of goods already belonging to a particular category. An example of what is:
Goods for children:
Rubber ball
Steel scooter
Hoop Goods for
adults: Fishing
rod (spinning rod)
Steel bucket, 10 l
Perforator torus spin. fisherman 's steel Ball rubber.
In general, the situation is quite delicate and, perhaps, full-text search will not cope. Then I remembered an article on Habré, in which a chat bot uses a neural network to search for an answer in a ready-made database. I'm in doubt, I don't know what would be the best fit here. What do you advise? Maybe there are other solutions?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
D
Dimonchik, 2016-03-28
@dimonchik2013

a neural network is cool, but it needs to be trained on something)) but how will it be, on what, it’s easier to work with it)
try shingles, letter by letter
“Sd. pr. com. one in. cold."

V
Viktor, 2016-03-29
@Levhav

I used https://pypi.python.org/pypi/redisbayes/0.1.3 to determine the reason why the ad should be banned.
The algorithm of the Bayesian spam classifier is quite simple to implement and will allow you to classify texts by learning on a test sample.
I used the redisbayes library in conjunction with pymorphy . I took the text, divided it into an array of words, excluded from the array prepositions
and other words that occur in any text (but, if, so that the like)
redisbayes.
The classifier can be taught not only to detect spam or not spam, but let's say it can be taught to separate an advertisement for the sale of an apartment from an advertisement for the sale of a car. But I didn’t succeed in separating an advertisement for the sale of an apartment from an advertisement for buying or renting an apartment, although this is probably possible, but I got a large percentage of errors, since the topics are close and the words in them are approximately the same.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question