R
R
Roger Martino2017-10-18 22:04:51
Programming
Roger Martino, 2017-10-18 22:04:51

Where to look for smart search information?

Hello!
There is some task. I'm asking for help with finding a starting point.
The bottom line is this:
The program receives a number of search parameters as input (let it be washing machines).
For example:
Length: From 50 to 150
Width: From 40 to 70
RPM: 1500
, etc.
(The bottom line is that these parameters are both static [the system can work with them] and floating [it would be nice to get semantics from the text] )
At the output, I need to get a list of links from the Internet with these parameters. (The server is constantly spinning and parsing sites with washing machines and writing the found options to the database)
What is the actual problem?
one)Firstly, it is not very clear to me in what form to store information.
I see it this way: We store in the database all the possible parameters that we managed to pull out of the sites.
When a request arrives, I get options for static parameters + if I can get parameters from the text using a neural network, I take a selection according to these criteria.
2) Here, in fact, there is a question of obtaining the semantics of a sentence or individual words. Are there ready-made libraries that will make my life easier and provide something already written? So that I just train the neural network and put it into operation?
3) What to choose for the server language? The server constantly parses the site and writes everything to the database. What for such purposes will be the fastest in development? Python? Java? go? What options should be considered? (Preferably with a large number of ready-made libs [parsing + database + search and machine learning algorithms])
4) Maybe someone knows some books, articles - any sources where I can look something on this topic?
How do you google for such queries? :)

Answer the question

In order to leave comments, you need to log in

2 answer(s)
R
Roman Mirilaczvili, 2017-10-20
@rojermartino

1) Firstly, it is not very clear to me in what form to store information.
At the moment, I see it this way:
each set of filters for the search will correspond to a set of URLs:
search_set_id => {URL1, URL2, ..., URLn
} to inflate the DB, it's better to create a table urls:
id | url
1 | http://gugu.ru?p=1
2 | http://gugu.ru?p=2
3 | http: //kuku.ru
4 | http: //mumu.ru
Thus, each search_set_id will correspond to a set of id from the urls table.
url_results
url_id | search_set_id
1 | 1
2 | 1
3 | 1
2 | 2
3 | 2
The set of characteristics for search_set_id can be stored as a set of ids from different key-value pairs (EAV pattern) or as a single JSON (hstore in PostgreSQL DBMS).
Having received the search_set_id, you can find the corresponding set of URLs.
Computational linguistics is not an easy science. Dig aot.ru site , Yandex ShAD materials and also read about their Tomita parser, etc. You should not expect a miracle, it is better to consult a linguist.
Python is good because it is easy to find all sorts of libraries and is also popular in parsing. It is better to take what you know better and on what it is easier to find specialists.
4) Does anyone know any books, articles - any sources where I can look at something on this topic?
How do you google for such queries? :)
Before googling, it is useful to clearly formulate your task and not set too general tasks. It is better to forget about neural networks until a better understanding of the problem.

D
Dimonchik, 2017-10-21
@dimonchik2013

sphinxsearch faceted search (also available in elastic)
Python

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question