A site that can withstand a high load (?)

Spoofing2012-07-06 18:07:08

PHP

Spoofing, 2012-07-06 18:07:08

I don't have any experience in web development. Designing an HTML + CSS site and setting up / running nginx is not an experience ...
Based on your article and the comments left, the question arises - so how then to make a site that can withstand a high load?

Everything on the site will be trite - a self-written blog, with the ability to comment on entries, and a simple forum, with a tree structure of threads and without categories. But I just like "high aerobatics" and therefore I want to do everything at once, with a high performance calculation.

It would seem that it could be easier and more lightweight than giving static .html files ...

Since messages for users on the site also need to be added, then PHP comes into play - simple and with a low threshold of entry, in which everything is written at once in flat code for a specific task, without tricking with classes, template engines, etc. etc.
So when a user adds a new post to the site, a PHP script is called that simply regenerates the .html with that new user post.

Can it be done even faster?

PHP script is called only once to generate a page - is it worth bothering with choosing a database then? Or can store everything in the same files?

Where is the best place to store the cache with .html documents? Suppose that in N years there will still be a lot of them.

I want to do everything right at once, and without my own experience, I hope for your advice.

Answer the question

In order to leave comments, you need to log in

9 answer(s)

edogs, 2012-07-06
@edogs

Don't sweat it. This is the main advice.
Good attendance (a server for 50 euros holds a wordpress with a roach and 200k hits per day without problems) You won’t get it soon, but you definitely shouldn’t think about N years (during this time even html may die, and you keep it).
So take any common engine (pkhpbb / VP) and finish it to the condition you need. If you like to make speed, there are a lot of bottlenecks in these engines, the elimination of which can be dealt with for a long time and with pleasure even by a beginner :) It’s
good to have your own bike, but first it’s better to take apart a couple of strangers.

zuborg, 2012-07-06
@zuborg

I want to do everything right
at once. Everyone wants to, but no one succeeds;)
is it worth bothering with choosing a database then?
Of course, you need to store it in a separate database, you can also use a file one. And then when you want to change the html template, it will not be funny.
It would seem that it could be simpler and lighter than giving static .html files
Actually, nothing, so for non-logged-in users who generate 90% of traffic, it is worth using static .html files. Requests from users who need to generate individual pages should be directed to the engine bypassing the cache (for example, due to the presence of a corresponding session cookie).
Where is the best place to store the cache with .html documents?
acc. documentroot so that nginx can easily find them and give them directly to the requested URL. It is highly desirable to maintain some nesting of folders so that each folder has a maximum of a few thousand files or other folders.
Or can store everything in the same files?
Everything is impossible. Only what is rarely updated and remains valid for a long time. For short-lived data, it is still better to use memcached, in order to avoid unnecessary disk load. Or FS in memory, if you really want to work with files. For short-lived data in php there is a wonderful caching tool - pecl APC module (its main purpose is opcode cacher, but it can also cache data)
Working with the file cache has its own subtleties. For example, the data in it must be changed atomically, i.e. through a temporary file and a subsequent rename(). It is also desirable to use locks to avoid the situation when several requests in parallel start generating the same cache item. Often there is no need to immediately regenerate the cache element when updating the data, it is enough to delete it, and the generation will occur when requested.

TheHorse, 2012-07-06
@TheHorse

The answer is theoretical, outside the context of php:
1. In general, storing everything in .html is not faster.
1. 1 If there are few of them and everything can be stored in the OP, then there is no need to store heaps of small files. But serialization is needed (in case of reboot).
1. 2. If there are many more files than can be crammed into the OP, then storing everything in files will be less efficient than other methods. The fact is that such files tend to have a large percentage of common information. In fact, each .html file can have from 0% to 100% unique information, to simplify the calculation, let's assume that this value is equal to 50%. Then, the tools you use do 50% more read/write operations on the file system, which by the way is the weakest link in performance in most cases.
If you store 50% of the general information (html templates) in the RAM (which is possible in most cases), then you reduce the load on the file system by 50%. To be precise, it’s not 50%, but it seems a little more, but this is another question with a “deep recess”.
If 50% of the unique information is difficult to construct (which is unlikely), or is constructed (calculated / read) in a non-optimal way, then the cost of constructing it may be higher than twice the cost of operations on the file system, and then your method is more efficient, but this is only with respect to read operations.
The write operation in your case will, on average, have more than 95% redundancy, in the event of a complete overwrite. This can be avoided if only what has changed in the file is overwritten, but in the general case, taking into account the structure of file systems, this is very difficult from the point of view of system and algorithmic programming.
Thus, you greatly increase the load on the file system, which will make your site less efficient compared to the developments of the research institutes.
2. I recommend using a database. Every modern DBMS is extremely inefficient, and does what you don't need (and more than once). But to do something better, specifically for your project, it will take a very long time.
3. If the task is to make a secure, fast, reliable site, then I think php, asp.net, python, ruby, node.js can in no way be compared with system programming in C / C ++ / Delphi (suddenly yes, even Delphi).
4. What you propose is a bright engineering idea, good luck to you.

Alexey Sundukov, 2012-07-06
@alekciy

>I want to do everything right at once, and without my experience
mutually exclusive. “Correctly” can only be done based on experience based on a specific task. At least I haven't seen any other options yet.

mithraen, 2012-07-06
@mithraen

It all depends on the problem statement. If you need it for business, then take wordpress, there are a lot of tips on the Internet for speeding it up and dotochki. Right now, tomorrow you won't have a thousand hits per second, and by the time you do, you'll finish it up to the performance that suits you.
If just for fun, you want to make a very fast engine, and this is precisely the main goal - only then it makes sense to take it and write it.
1. Use nginx's analog of SSI to assemble a page from several pieces. This will save on writing (most of the content will be static, and mostly navigation elements will be updated frequently).
2. Nginx does an excellent job of distributing statics, and it will rest solely on the bandwidth and speed of the disk subsystem.
3. Therefore, to exclude the influence of the disk subsystem - use a server with SSD (they are already available), also part or even all of the content can be duplicated in tmpfs.
4. Frequently changed data (for example, a list of new materials on the forum, etc.) makes sense to keep in memcached. Fortunately, nginx can distribute directly from memcached.
5. The choice of language for development is not essential here, in fact. And the task is massive. At the same time, since you are tied to nginx with specific settings, “regular hosting” is no longer suitable anyway. And therefore, you can choose any development language, based on what it would be better to write such a solution on. You don't have to be limited to PHP.
6. For such projects, the main thing is to quickly roll out the concept. After realizing the results, you will still be forced to rewrite 90% of the code. So if you know PHP - write on it. Combining learning a completely new language with thinking through an unusual architecture for a new service is a difficult task.

Nikolai Vasilchuk, 2012-07-07
@Anonym

Premature optimization does more harm than good.

egorinsk, 2012-07-07
@egorinsk

To begin with, you hardly need it for practical purposes. There are 86400 seconds in a day. An average normally written site (not Drupal, not phpBB and other crooked code. Not Zend and not Symphony) in PHP using MySQL on an average spherical VPS in a vacuum (with 256 MB of memory) can withstand 40-50 rps. Sometimes even it rests not on the processor, but on the channel width.
40-50 rps * 86,400 = about 1-2 million hits per day (because the load is uneven in time of day). This is about 100-200 thousand average active visitors (or 20-50 thousand, if we are not talking about a blog but about a social network). You probably won't have that much.
Let's abandon the idea of generating static HTML right away: you can't generate any functionality a little more complicated than a set of unchanged pages. The complexity of the code and the number of cache dependencies will grow exponentially. The idea of inserting dynamic page fragments via SSI/AJAX is unpromising - they will still cause PHP to start up, possibly even increasing the load. It is better to write a scalable application with good data caching.
It might be a good idea to move away from PHP in favor of Java/C++/.NET/D. But this will complicate the development: in these languages, it is more difficult and longer to write.
Even if all readers of Habr, tema and several other blogs rush to your blog. Having a crudely written script written in crooked PHP, we have the ability to scale 10 times with an increase in load: we install more powerful hardware, expand the memory from 256 MB to 64 GB, install a normal 8-core processor, normal disks. Tune the volume of MySQL caches, add APC, start caching pages in memcache little by little. Further, if the load is still growing, we spread the code to several frontends, and, possibly, make a master slave on MySQL. It is possible to compile PHP via hiphop.
This is enough for many projects.
But this is a dumb approach. To make replication, install a balancer and increase memory - you don’t need a lot of mind. Even a monkey can handle it. It is much better (and more interesting) to initially make an application, taking into account the ability to scale (what is there to be embarrassed about) unlimitedly. And it’s much more pleasant to realize that your application can grow no worse than contact with its excellent students and winners of mathematical olympiads.
Imagine that we have a growing load and we need to increase to 1000 nodes. As for the frontends (in your case, in PHP), if there is not enough power of one server, we can easily install at least 1000 or 10000 of them (servers) (the only thing is to refuse to save sessions in local files, otherwise no one will be able to log in. Perhaps , you should switch to REST and abandon sessions altogether). We put N balancers on nginx in front of them, set up round-robin in DNS (so that requests fall on them one by one).
You can see how DNS round-robin works by typing nslookup vk.com a few times.
Memcache also (you have to use memcache in the application) easily scales across N servers. If only there was a lot of memory and at least a gigabit local network.
Distribution of statics (images, CSS, scripts) is also commonplace - we put nginx on N servers and forget about this problem. The only difficulty is the distribution of the video. Google, on Habré there are articles about organizing a CDN for video and related problems.
But from the database we get a gag. Even if we set up a master slave with several slaves, the amount of data to be written to the master from 1000 PHP frontends will be put by any server. Despite the fact that MySQL in a spherical configuration on an average server in a vacuum easily makes 1-5 thousand PKEY selections per second, it performs much worse for writing. Therefore, since we are making a highload service, the base must also scale. First, you can spread different tables to different servers. It is not enough. secondly, you can cut the tables into pieces and spread them to different servers. This is what you need. That is, for example, in a social network, users with id 1-10000 are stored on the first server in the users_1 table, users 10000-20000 on the second server in the user_2 table, and so on. The distribution of records and tables across servers should not be tightly driven, but configurable, so that you can transfer batches of records from one server to another, balancing the load. To do this, you will have to write a small application that shows the load distribution and allows you to reconfigure the shards.
Obvious rules follow from such a database construction scheme: database queries should not use JOIN (since this is impossible with spaced tables, and jdoins do not work well), they should use selection only by indexes, best of all by PKEY (since it is too slow without indexes) and should be as simple as possible. They should have a small LIMIT. Also, you should use data denormalization: for example, to get a list of user photos, we take a serialized list of id of these photos, then select a photo by id, and DO NOT use a selection by the user_id field in the photos table (since such a selection is not sharded or scaled) .
In general, queries should be dumb as a cork and boil down to SELECT ... WHERE id IN (1, 2, 3). By the way, as an additional plus, such requests are easy to cache and clear such a cache.
For services such as user search (like VKontakte), with such a scheme, you will have to write separate applications in C ++ that will index the database and store data in memory. Alas, PHP will not cope here.
You can read about the architecture of high-load projects here: www.insight-it.ru/highload/
And yes, that's it. what is written above, only theoretical reasoning. Think 10 times before putting it into practice.

CKOPOBAPKuH, 2012-07-07
@CKOPOBAPKuH

> so how then to make a site capable of withstanding a high load?
1. do just the right functionality
2. optimize
3. redo everything
I don’t know of cases when it would be possible to get rid of step 3. sometimes it happens earlier (in this case it is relatively painless), sometimes later (then it is very difficult), but it always will be, no matter how hard you try to foresee everything.

Deffe, 2012-07-07
@Deffe

A. This is my theme. The php captain is right there.
itmages.ru/image/view/585443/11cf38f0