What are good practices for dealing with a large number of records in a table (1M+)?

C

Crash2019-07-17 13:55:26

MySQL

Crash, 2019-07-17 13:55:26

I am working on a project in which data is accumulated quite intensively in one of the tables. At the moment it has 60+ columns and more than 500k entries. Recordings are added by about 3-4k per day, in the future the recording intensity will only increase. The interface of working with it began to slow down, in connection with which I am now engaged in optimization.
For myself, I singled out the following methods:
1. Indexes, selectively for the fields by which the search is most often carried out.
2. Combining several columns into one, for the same type of data. They will be stored in JSON format.
3. Partitioning. Of course, the work in the project does not go simultaneously with all the records, most of the old records are kept more for history. However, they may be needed at any time. Based on this, while I plan to create a new, "hot" table with an identical structure. The "hot" table will store fresh data, say for a month. In the "cold" source table - in general for the whole time, including fresh records. The hot table will be for operational work, the cold table will be for search on demand. In the future, the cold table can be divided into several, sharded.
Please tell me what else can be done?

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

M

Melkij, 2019-07-17
@Bandicoot

Combining several columns into one, for the same type of data. They will be stored in JSON format.

JSON is larger than a flat table, if only because of the need to store keys.
And, as far as I know, mysql can only btree so far. What is suitable for an arbitrary search from the word in any way. Only for a previously known criterion.
The word good is mentioned. It remains to understand why everything else. Partition tables can regularly even such an amazing thing as mysql and for a very long time.
Profile where you are wasting time and see what you are doing with the data.
The announced figures are not impressive from the word at all. If a person wants to play around with different indexing approaches and the performance of complex queries, I advise you to start with a million records.
A typical mistake is trying to build aggregates and other count(*) over millions of rows on the fly.

N

nrgian, 2019-07-17
@nrgian

What are good practices for working with a large number of records in a table (million+)?

Big?
For modern computers (even weak smartphones) this is a meager amount.
Should not slow down on your volumes.
Indexes are not correct. Or requests.
Or in general the problem is not in the database, but after it.
Do explain , see what is inefficient there.
And, an important point, the data selection should be filtered by means of the DBMS.
To be given from the DBMS for further processing in your program should be minuscule.
Are you serious? On a ridiculous million records?
It won't affect at all. Unless we are talking about hundreds of columns combined into one.
Do you already know "how often"?
With numbers?
Did you take measurements?