How to organize the scaling / fault tolerance of an online store?

C

chipstore2021-03-05 07:31:53

linux

chipstore, 2021-03-05 07:31:53

There is an online store with a traffic of ~ 350k visitors per month, it runs on VDS (Intel Xeon 2.4 x 6 cores, 8 GB of RAM, SSD disk, debian 9 OVZ), the standard nginx/php7.4/mysql stack, the engine is no longer up-to-date, mostly rewritten, but in general everything works successfully ... Orders are merged into 1C, the rest of the goods are loaded from it.
Sometimes there are problems, and even at inconvenient times, of a different nature: either the hoster's parent server will fall, then nginx will fall off, then in php the number of active processes will reach the border (they hang on waiting for external resources).
I would like to increase the stability of work in general and with a reserve for future load growth. The question is, what are the ways to achieve the goal (conceptually)? An increase in costs is not a problem, but I don’t want a multiple increase in costs either ... And it’s important that it be acceptable in terms of the number of gestures for implementation (you don’t want to change the engine, globally rewrite exchanges, etc.).
Change host? Take the second VDS from another and put it in parallel? Which is better and what are the other options?

Reply

Answer the question

In order to leave comments, you need to log in

4 answer(s)

S

smilingcheater, 2021-03-05
@smilingcheater

If without significant code rewriting:
Move the database to a separate server. When the load grows, add servers and set up replication between them.
Make a separate balancer - a server to which requests come, and which scatters them to different servers that execute the code. When the load increases, increase the number of nodes executing the code. If one of the nodes stops responding for some reason, the rest continue to work.
If your sessions are stored in files, you will have to transfer their storage to the / Redis / ...

S

Sergey Pankov, 2021-03-05
@trapwalker

smilingcheater wrote everything correctly. I'll just add that you need to analyze the bottlenecks and fix the bottlenecks first.
Deal with problems in detail. What does "nginx is failing" mean? Works for years and nothing happens to him. Find out what happened, if this happens again, then you need to figure it out, and not hope that someone will give a universal simple but conceptual advice that will defeat all possible problems in advance.
The rules here are simple: find a bottleneck and expand it until the overall performance during peak hours is satisfactory.
Nadalte save logs, log loads, monitor the database. If the problem is external, then try to choose a more reliable hosting, change OVZ to KVM, separate the database and backing to different instances, screw on the balancer,
If somewhere the speed is not important and you can postpone something, add a queue.

N

neol, 2021-03-05
@neol

then the hoster's parent server will fall

How often does this happen? In principle, everyone falls, it is more important how quickly support reacts and work is restored.

then nginx will fall off

I agree with Sergey Pankov : nginx just doesn't fall off.

then in php the number of active processes will reach the limit (they are hanging on waiting for external resources).

If it is impossible to make calls to external resources in the background, then create a separate php-fpm pool (possibly several: for each service its own), which will be responsible only for accessing these external resources. Even if the external service falls, then only calls to it will fall off, and the site will continue to work.

A

Alexey Dmitriev, 2021-03-05
@SignFinder

In order to take a second VPS, you will need a third one - to balance the load / connections between the two. It may also be unavailable
. I would advise you to consider the option of cloud services. Not VPS \ VDS, but services that can themselves monitor fault tolerance and scaling, for example https://docs.microsoft.com/en-us/azure/app-service...
Either or a complete relocation of virtual machines to a cloud like yanzhex using their balancer, or at least using their balancer https://cloud.yandex.ru/services/network-load-balancer or another similar service