V

vasdor2015-04-24 09:04:47

MySQL

vasdor, 2015-04-24 09:04:47

How to eliminate table brakes after a big delete?

Hello!
There is a table with the following structure (I did not specify unnecessary fields):

CREATE TABLE `logs` (
  `INSERT_DATE` datetime DEFAULT NULL,
  `DATA` text NOT NULL,
  KEY `INSERT_DATE` (`INSERT_DATE`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

Data is inserted into it, about 1 kb per line. INSERT_DATE is populated via “now()”
About 15 million records are received on the day of new data. In the morning of the next day, data for yesterday is taken to another application over the network

SELECT INSERT_DATE, DATA from logs WHERE INSERT_DATE >= '2015-04-16' AND INSERT_DATE < '2015-04-17'

then a view request is made, the data is deleted

DELETE FROM logs WHERE INSERT_DATE >= '2015-04-16' AND INSERT_DATE < '2015-04-17'

Without this request, everything is fine, and if it is not executed, the tablet continues to quickly insert about 200 requests per second and quietly grows to 50 million, for example.
However, if the data for the previous day is deleted, then the insert starts.
But after this request, the insert starts to slow down, connections accumulate, you have to do
optimize table logs;
it for a long time, the table is idle.
There is an idea that the problem is in the fragmentation and rebuilding of the index after many inserts / deletions.
Where can I look to solve the problem?
Thanks in advance!

Reply

Answer the question

In order to leave comments, you need to log in

5 answer(s)

A

Andrew, 2015-04-24
@dredd_krd

Partitions, fast and easy. Every day, according to the script, you can add a partition the next day, and when you need to delete a thread of a table, you just drop the partition, and the rest of the data remains. No need to rename tables or delete old ones. The table remains alive and available, there are no moments when it may not be available (for example, at the time of renaming).
Data on partitions is distributed just by the value of the key, so there will be no problems.

L

Leonid Sysoletin, 2015-04-24
@sysoletin

Rebuilding indexes, slow disk.
Look in the direction of renaming the old table, creating an empty new one for new records, and then deleting the old unnecessary one. Then no reindexing, no long work with the disk - beauty!

D

Dmitry Entelis, 2015-04-24
@DmitriyEntelis

If these are logs, then:
a) why is there an index?
b) think about whether you need to store raw data at all, or can you immediately aggregate something?
Plus Leonid Sysoletin , the solution with the daily table is alive.

I

Igor, 2015-04-24
@KorroLion

For logs, it would be possible to do sharding by day. Tables: Logs_1, Logs_2, Logs_3, etc.
Drop old tables (it's instant!)
The table number can be the day of the week, the day of the month or the year.

L

lega, 2015-04-24
@lega

Bicycle, but (because the task is simple) can be dumped into files, one day - one file, it will work much faster (transfer / reading over the network can be accelerated up to x100) + saving disk and memory. If desired, you can do compression - will save disk and faster transmission over the network.