Answer the question
In order to leave comments, you need to log in
Can it be solved with a neural network?
The next task is to have a database of real estate ads.
it is necessary to structure the data from declarations.
those. price, floor, total area.
All ads are written in an arbitrary format.
Is there any way to solve it with a neural network?
if yes, what algorithm to look at?
Thank you.
Answer the question
In order to leave comments, you need to log in
it's called Data Mining, I think it will help in searching for options in Google.
In principle, any task can be reduced to a neural network, but in your case, you need to train it, besides, you need data for training, i.e. several hundreds (thousands) of ads in which the price, area, etc. are marked.
We reduce the problem to the classification of a group of numbers, i.e. we mean that all groups of numbers in the text will have one of the following classes: price, area, floor.
First you need a good tokenizer : write a script that splits the ad text into words, while trying to keep the numbers in the same group. For example: “I am selling a 3-room apartment for 5000 rubles.” => ("I sell", "3", "-", "x", "room", "apartment", "for", "5000", "rub", ".").
Then you create a simple interface for text annotation. Those. we take, say, 1% of all ads and mark by hand where the price is, where the area is. The result will be something like this: {"I sell" => '', "3" => ROOM , "-" => '', "x" => '', "room" => '', "apartment" => '', "for" => '', "5000" => PRICE , "rub" => '', "." => ''}. This will be the training data.
Now let's create a neural network. Let's take a multilayer perceptron with 3 layers: the input layer is our data, the hidden layer and the output layer are the class signals. The output level will have the number of nodes according to the number of classes, i.e. 3 (price, area, number of rooms). You can add a 4th node for no class. The input layer contains data feature nodes. As characteristics, you can take N-words before the number and N-words after. The number N can be varied (say, from 1 to 10 i.e. 10 different networks). As a result, there will be a huge input layer with the number of NxM nodes, where M is our dictionary, i.e. all the words found in the ads. The input data will be: (0, 0, 0,… 1, 0, 0, ....) - for each of the N groups, i.e. 1 will indicate that the word is present in the window. The output will be (1, 0, 0) for price, (0, 1, 0) for area, (0, 0, 1) for number of rooms.
We train on 1% of the labeled data until the training converges. You can vary the number N and see which option gives the best result. Then we try to mark up the remaining data with the resulting network.
Not sure if this is the best option :)
Your task is not entirely clear, but I can advise the book “Programming the Collective Mind” by Toby Segaran. The book discusses similar algorithms in python. I think you will find something useful for yourself.
It is the neural network that is unlikely to help here, this is the task of Text Mining.
Specifically, in your case, I would suggest solving it by listing the patterns found in the text, and then I would apply boosting, as here: habrahabr.ru/company/mailru/blog/112142/
Purely theoretically, any task solved by a computer can be solved on neurons, here it is rather a matter of expediency. In this case, the use of NS is not optimal.
Use the Yandex Tomita parser . He might be good for that. You just have to write the rules, of course.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question