Database clustering

D

darkslesh2011-10-04 22:43:19

MySQL

darkslesh, 2011-10-04 22:43:19

Database clustering

I read on the net that VKontakte uses MySQL as the main storage. And many other projects use MySQL. An interesting question is how it is implemented from the technical side?
Not exactly like them, but how it can be designed at all, the main requirements are:

Using free databases
Transparent work with the database (i.e. scripts should not know how and what is arranged there and connect either to one server always or to a random one in a cluster)
If one server fails, so that work continues and data is not lost
Great performance (a very large number of requests were processed)
Good extensibility (without shutting down the system, you could add or remove a server)

The restrictions are the following:

The base is relatively small (maximum 8 gigs, although it's not a fact that it can get bigger)
Almost all tables are linked through foreign keys
The queries are relatively simple (the largest number of SELECTs. A little less Insert and very few Updates)
Queries are primitive and most often affect 1-2 tables

Here's the question: What is the best way to implement this?

I myself tend to memcached + MySQL (InnoDB) + NDB, but something is not clear with NDB, many people spit but do not explain what and how, but information often flashes that if the database becomes larger than the RAM on some server, then everything it will be bent (besides, I didn’t understand how the storage system is implemented there, because judging by the documentation, everything is stored in memory), and there is no support for foreign keys, and without them it will be quite difficult to live. With replication, the matter is also not very clear (there is duplication of data, but all the same, calls go only to master ).
The main task: reliable storage + fault tolerance and acceptable speed under heavy load (about 10k requests per second). Who can advise or give a link to an article or documentation.

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

P

pentarh, 2011-10-05
@pentarh

The admin comes to clustering with one of two problems
1. Bottlenecks, which are impossible / inappropriate to compensate for by increasing the capacity of one server
2. Building a highly available service (High-availability)
Accordingly, the first one will be very expensive and cumbersome. Speaking of which, I came across really heavy loads, but with the correct organization of the database structure, the hardware kept everything. It's easier to really optimize the structure in this case than to break through all sorts of NDBs and a replication master-master.
The second comes down to building a master / slave cluster, which automatically changes roles in case of failure. I do not recommend replication. You can look towards DRBD || GFS || GPFS + Heartbeat || Pacemaker