Answer the question
In order to leave comments, you need to log in
What are the main steps for transferring to HighLoad?
Hello. As a web developer (php, ruby) and a unix administrator, I'm wondering what steps are taken to adapt a web project to highload.
Actually, I wanted to know the most typical solutions (so that I could google on the topic and figure out how to implement it in practice).
For example, from what the names are known (in terms of php) - installing nginx as a frontend, or generally the only server along with php-fpm, using php-apc, memcached, transferring bottlenecks to NoSQL. But these are all events within the framework of one server, and I'm more interested in spreading across several servers.
Specifically, I was prompted to create this question by the desire to find out how data is mirrored on servers and a single database is used, in the case, for example, of using DNS Round Robin.
Answer the question
In order to leave comments, you need to log in
Probably a good way to find out is to study the architecture of existing highload projects. Keep the link: www.insight-it.ru/highload/
From myself I will add that it is almost always necessary to provide for the possibility of scaling in the application itself (that is, rewriting the code). You can’t just take, install nginx, change MySQL to mongo and get a highload project (moreover, changing mysql to mongo can get even more problems).
You have described the issue very superficially.
Nginx + php-fpm is highly recommended. But the most important trouble is database replication / sharding. I recommend reading not only on the official database sites (everything is always “fast, reliable automatically” there), but also on serverfault, news.ycombinator.com. MySQL is very bad at replication, MongoDB also has jambs with this. PostgreSQL is well-received (there is even a master-master), but I myself have not tried it in practice. I tried Couchbase - it clusters perfectly, even a child can handle it, it can replicate cross-datacenter. But this is NoSQL - the database must be selected by task. If rdbms is more suitable, then PostgreSQL is better, imho :)
You will also need cache clustering - as options: Amazon ElastiCache, Couchbase, Riak. In a few months there will be Redis Cluster :)
Do not store several motley services on one vps. For example, services that will perform long heavy tasks in a cron should be moved to a separate vps.
I also recommend making an api for the internal interaction of applications so that they do not communicate with each other by changing values in "foreign" databases or tables.
It turns out that you asked a question about horizontal scaling.
Application servers are scaled using a balancer (HAproxy or nginx itself). They store only executable files (roughly speaking, their content is the same at any given time and does not contain any UG content), so data mirroring is not needed here - logging goes to a database or some centralized service, statics to CDN. Database servers - replication / sharding, here, depending on the specific DBMS used, in the same Mongo everything is out of the box, in PostgreSQL - PL / Proxy, PgPool.
The most important thing is to separate and isolate parts of the application as much as possible, if it writes logs directly to its host, uploads userpics and does something else, it will be very difficult to scale it all later.
HL is actually an optimization problem. First of all, it is necessary to reduce response generation times, loading times on the client, which is achieved by optimizing the application itself (profiling, gluing requests and spacing content) and then caching, which can be different - stupid (set nginx ) or structural (we break it into blocks, which we form and put, for example, in memcached ), managed (the content changes if necessary) and not (by timeout). For caching, you may need to change the application (data updates) or adapt caching (cache reset or ignoring cookies for example)
Further, if the server can no longer cope alone or HA is needed, we move on to horizontal scaling. And you need to start with the fact that requests must be atomic - any states, such as sessions, will complicate scaling (you will have to share sessions to a cluster or bind a user to a server by ip, for example, which is easy if sharding, but HA suffers). Which database (SQL or NoSQL. Not to mention the name) or cluster depends primarily on the application, and not on fashion or comments on Habré. It is better to live on MySQL, especially since Percona + Galera are very good if you know it well, than to plunge into the problems of an unfamiliar server in production. Again, a specific technology should solve specific problems, which are determined based on the architecture of the application in the first place. Well, try and experiment.
In addition to using memcached, it helps a lot to send "heavy" tasks to the queue. e.g. rabbitMq
I once faced a similar challenge. See my article on this.
habrahabr.ru/post/106311/
In a nutshell, architectural decisions are much more important than software ones. Those. bad architecture will not be saved by using nginx, it can only postpone problems. And regarding the use of the number of technologies, I would advise using a minimum, but wisely. It is better to have 3 well-established nodes than 10. Each new technology will only complicate the infrastructure.
Regarding database access, there are a number of solutions, such as sharding and replication. Some databases support batch replication.
I also advise you to read What is highload .
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question