Theory: the structure of a highly loaded service?

spry2010-12-08 12:05:58

NoSQL

spry, 2010-12-08 12:05:58

It would be desirable to learn from habralyudey in what my judgments are incorrect. So let's get started.
Task: to build a service, with the possibility of horizontal scaling, which in the future will theoretically be highly loaded.
What are my thoughts on the topic, questions on each item right in it:
- there is a domain (the name is taken from the ceiling) hls.com
- the registrar for this domain has the maximum number of DNS servers (6?), Which are their own and scattered around the world (has does this make sense?)
- DNS zone contains the maximum number of A and AAAA records (32?) in order to get DNS round-robin.
- Each address specified in the DNS has a load-balancer (hardware or software? How does load-balancer determine which server to issue, how does it determine the least loaded server?)
- Each load-balancer manages a certain number of ngnix servers (or some other software, if so, which one? how can ngnix choose the server with the least load?)
- ngnix server manages a certain number of web servers that actually provide content.
- Each web server has Apache HTTP, PHP or Ruby on the machine and local memcached (or is it not local?)
- There are 2 types of databases behind web servers - where the links between objects and the objects themselves are stored. All of them, by condition, must be able to scale horizontally.
- As a distributed storage of objects, we use something like memcacheDB or BigTable (or some other? That is, each object has a unique key that carries not only the ID of the object as such, but also information about the type of object)
- In as a distributed storage of links, you need to use some kind of graph-based database (correct? if so, which one?)
- There are also 2 sets of memcached servers that cache requests to both types of database.
Habralyudi, am I thinking in the right direction? What didn't I learn? Where to read? Who has already done it? Help clarify this.

Answer the question

In order to leave comments, you need to log in

9 answer(s)

Eugene, 2010-12-09
@immaculate

In my case, the project was written "anyhow". More precisely, quite competently, but without any thoughts that there will be a lot of users, and you will have to scale somehow. More or less beautiful code, a bunch of tables connected to each other, that is, almost dozens of JOINs. Caching was not used at all.
Everything worked (and works) on 3 servers: PostgreSQL base, nginx for statics, nginx with gunicorn for the actual application.
For the first two years, this was enough, but the number of users and features grew, as a result, you have to periodically sit down and rewrite pieces of code: denormalize the database to avoid JOINs and lookups in additional lookup tables, try to stick caching (the biggest headache is caching foresee at the outset and think very, very well), etc. etc.
I'm just describing my experience. It seems to me that the moral is this - do not initially overcomplicate everything. It is necessary to think about performance, but not to the point of fanaticism. Most likely, at first, a simple code and one or two servers will suffice. It is unlikely that you will immediately get the second most popular muzzle book. On the contrary, those who think that their project will immediately take over the world are most often mistaken.

VBart, 2010-12-10
@VBart

Your question says "theory", and then there is a presentation of some practical facts, and very remotely. As already mentioned above, you have a radically wrong approach.
Each specific architectural solution depends on specific tasks. To do this, there are system architects, whose tasks include a painstaking analysis of the tasks of the project and the choice of specific technical solutions in a particular case. In large, highly loaded and constantly developing projects, these people must work on a permanent basis and receive a salary.
No one can help you in this case for two reasons:
1) You have not stated in all the technical details and details of your project. About pictures, social. network and so on - this is not enough, you need a multi-page detailed sensible description of all the required functions, at least ... I'm not saying that it would be good to specify the resources, as well as estimate the load.
2) This is not done like this on the knee. An intelligent detailed analysis can take several months, and of course no one will do it for free. There are some theoretical foundations, but they are so theoretical that you have not even outlined them above. The number of DNS servers, AA records, nginx, php, database device, etc. - this is all a practical area, which is highly dependent on the task. You can implement anything you write and end up with a cumbersome, clumsy, poorly scalable application at a huge cost. Based on what you wrote, I can only advise you not to do this, because you initially already have the wrong approach and wrong ideas. And any practical advice that has been written to you here, or will be written, is nothing more than personal unsupported experience,
I can only share advice, as I do when choosing a specific technical solution, step by step:
1) Gathering requirements. It is important to collect and identify as many requirements as possible determined by a specific task in relation to a specific issue. For example, all the requirements for storing data of such and such a service.
2) Select as many options as possible with the help of which the problem is in principle solvable, and then exclude from them those that obviously do not fit into the requirements, leaving only those that most satisfy them (it happens that it is impossible to satisfy all the requirements in principle).
3) A technical solution is always a compromise. Of the remaining options, you need to choose the most suitable one, often for this you need to conduct comparative testing (your own, on tests that somehow model your task). If the result still does not suit you, you should probably reconsider the requirements or split the task into several, if possible. Anyway, this refers you to the correction of point 1.
Bonus track 1: KISS
Bonus track 2: One size never fits all

Pavel Chipak, 2010-12-08
@reket

You are too bothered ... A well-designed database structure and regular caching will last for a long time.

amarao, 2010-12-08
@amarao

starting from the wrong place. Start with the application architecture. Let me remind you that out of the three: high availability, data consistency and performance, you can choose only two.

VBart, 2010-12-11
@VBart

The rest of your non-database arguments also raise a lot of questions. Firstly, if you already have a hardware load balancer, then why is nginx behind it for the same? Why this heap of http-servers? Passing traffic through this Christmas tree from web servers not only does not add speed, but vice versa. Why doesn't nginx balance the application server directly? Why do you need Apache? You do not trade in hosting, as I understand it, where the main charm of Apache and its little additional brake - .htaccess files will already play. All your phrases about caching and about sets - memcache also make no sense without a clear understanding of what to cache, when and how, according to what principle. Caching is sometimes even harmful, and certainly always laborious. They resort to it, firstly, when it is really possible, and secondly,
You also asked how the balancer will distribute the load, again, it’s up to you to decide based on your tasks, according to what principle it should work, because there is no magic here either. Nothing was mentioned about the sessions, how will you handle them, do you have any?
A significant role in a high-load project is played by the possibility of easy simple support for this large fleet, ease of configuring new machines, integrating them into a pool, automatic shutdown and reconfiguration in case of failure of something, and therefore quick diagnostics and monitoring. You did not cover these issues at all, but at the same time heaped up a rather complex system. The same vertical scaling is not necessarily a dead end and a false path, but for a bunch of projects, it will even be preferable.
But they mentioned a certain database on graphs, I didn’t hear at all that they had an arbitrarily wide use in web projects, including highly loaded ones. How are you going to use and scale it? There are also a lot of questions.

Pavel Chipak, 2010-12-08
@reket

If you initially do not properly organize the architecture in the future, you will rake off a bunch of crap. Usually a project is created just to work, and when the load appears, they scale and eliminate bottlenecks.

Georgy Khromchenko, 2010-12-11
@Mox

I advise you not to fool yourself, but to develop at least something that will work. Even one server can handle a lot. Then, when the time comes, move SQL to a separate server, then install a SQL cluster, install nginx with load-balance, and so on, think in general;)

config, 2010-12-11
@config

It seems to me that 1 server is enough for 25k users. we did not have the most powerful server and 150,000+ were present there per day. of course, it depends on the project, in my case it is an online store. For larger loads, you can move mysql to a separate server.
if this is not enough, it is quite easy to build a Master + N slave scheme with minimal changes to the application, to the master record, from the read slaves. This is enough for a couple of million visitors for sure. If it is supposed to record a lot, then you need to scale using sharding. But all this can be thought about later, when there will be visitors who will generate profit.
If an avalanche-like growth of visitors is planned and it is not clear what the attendance limit is, then, of course, you need to take care of scaling in advance.