Correctness of applied solution?

D

dmitriyecho2012-09-16 17:09:34

MySQL

dmitriyecho, 2012-09-16 17:09:34

Hello.
So, given:
A certain base of entities, with about 15 text fields and 15-20 int fields.
The expected maximum volume is about 30 million lines.
Also to the main table there are a dozen small tables associated with it mm, i.e. link tables with some directories, tk. the values of some attributes of an entity can be multiple and a selection can be made on them.
Output load - no more than 2-3 thousand search queries per day.
Input load - 100-200 entries per day or less.
I use Mysql as a database.
The main question is whether it is necessary to use sharding with such a volume with the following architecture:
- Search in text fields and directories is carried out by the sphinx.
— Search in fields with numeric values will be carried out using index tables, in which there will be 2 inta
: entity id and dictionary value id.
- Entity data required to form lists (main fields such as title, short description, etc.) are cached in memcached for all 30 million, the cache is updated when the entity changes.
- Direct selection from the main table will be carried out only by the Primary key, which is a unique numeric identifier of the int or bigint type
Well, questions.
1. Should sharding of the entity table be used for such loads, or is 30 million with the specified data not a problem?
2. Should the entity table be denormalized into two - with ints and with text data and not mess with extra index tables for searching and search directly on the table with ints?

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

A

Anton Dyachuk, 2012-09-17
@Renius

1. 30 million rows and 200 records per day
150,000 days - 410 years
Taking into account "a dozen small related tables" 41 years, even in this case, the records will become obsolete
2. A sane structure, with competent indexes will give you performance comparable to 1k records and 10m.
3. Table denormalization can be useful for organizing an efficient index system.
4. 86400 seconds per day, 8640 requests per day can be processed by your system, even if the request duration reaches 10 seconds.
5. Pay attention to what the result of the sample will be, can the result of each sample contain 1m records? Limit results.
6. Sharding is necessary for heavy loads, in the case of thousands of records per day, sharding, as it seems to me, is not needed.
7. With such a volume, the database will take approximately 15-150GB, again, it seems to me that sharding is not needed again.

B

bolnikh, 2012-09-16
@bolnikh

Hello
According to the description, the load is small. So the question is “can the system handle the load”?
If so, then you don't need to touch it. Data growth is insignificant.
If not, then a solution must be found. Perhaps it will be sharding, perhaps something else - add memory, ssd disk, etc.
If a significant increase in load is planned, then we should try to assemble another such machine and load it. Withstand the planned load - approx.