Answer the question
In order to leave comments, you need to log in
How to competently horizontally scale the code?
The point is the following. There are users who can create projects. In each project, they set up keywords and places where they want to get data from. The challenge is to continuously collect, in real time, a lot of data from the outside on these user projects. Multi-threaded, of course. After that, they are processed by keywords with regular expressions, negative keywords, etc., and sent to these projects by users filtered with highlighting of the found keys. There can be hundreds of keywords, negative keywords, and places where you need to parse data from in each user project. All this needs to be done as quickly as possible.
It is supposed to use PHP7 and YII2 framework.
What problems do I see in advance:
1. High load on code execution
2. High load on writing to the database
3. Load balancing
And, in fact, the question is: what architecture, technologies and algorithms will be the most effective here?
Answer the question
In order to leave comments, you need to log in
From what first comes to mind:
1) the master and slave bases, the slave is constantly synchronized with the master, most of the read requests to the slave (where possible and where there are no transactions in which we read and immediately write), in we write in turn to master. Thus, each database can be configured in a special way in read / write priority.
2) Queue mechanism: parsing, processing, etc. are done not at the request of the user, but by demons (by cron), the mechanism is as follows:
1. The user creates a task, the task is added to the queue;
2. A daemon that runs constantly or starts periodically takes a task from the queue and executes it, this step can be divided into several steps, for example, one daemon downloads and adds to another queue, from which the second daemon extracts the necessary data, etc.
3) Look towards PostgreSQL, not MySQL, it is a more serious database and will probably cope with some tasks better than a muscle.
That's all, directly in the code, and of course all sorts of balancers, databases on different servers are possible (main database, statistics databases, raw data databases)
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question