What technologies should be used when writing a search results parser and analyzer?

K

krll-k2017-05-29 20:00:58

Parsing

krll-k, 2017-05-29 20:00:58

There is a search engine. You need to check it regularly. See where you are in the top, who your competitor is, etc.
The Codeception framework for acceptance testing is quite suitable for the basis for the parser, in addition, there is an implementation under node.js. You can write tests, see if your site is on the pages of Google and Yandex in general
Stage two. It will be necessary how to handle the output. Therefore, a database is required. Since the platform is selected in the form of node.js, then most likely it should be a nosql database. rethinkdb would be a good candidate, because mongadb suffers from incurable sores
Step three. You can’t parse data into one thread, you need to use several workers at once, here the rabbitmq or gearman queue server will help us
And finally the fourth stage. It is necessary to somehow deploy all this, and launch it, and bring it to mind. Docker? I think yes. In addition, rethinkdb and rabbitmq already have ready-made images, it remains to write images for the workers
. Stage five. Deep analysis of collected data. Neural networks

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

D

Dimonchik, 2017-05-29
@dimonchik2013

proxy , many, many, if you are talking about Google, this is the main obstacle
of the database, usually two - NoSQL for unstructured data (from the oven), and MySQL / Postgre for historical storage,
now you can do something for Clickhouse