D
D
DamskiyUgodnik2018-06-02 12:15:10
MySQL
DamskiyUgodnik, 2018-06-02 12:15:10

How to speed up data insertion into a table with 500 million records?

Hello!
Faced such a problem, there is a need to keep a table for 500 million records. MyISAM table. Table structure:

CREATE TABLE `users` (
  `id` int(11) NOT NULL,
  `group_id` int(11) NOT NULL DEFAULT '0',
  `email` varchar(255) DEFAULT NULL,
  `reg_time` timestamp NULL DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

--
ALTER TABLE `users`
  ADD PRIMARY KEY (`id`),
  ADD KEY `email` (`email`),
  ADD KEY `group_id` (`group_id`),
  ADD KEY `reg_time` (`reg_time`);

ALTER TABLE `users`
  MODIFY `id` int(11) NOT NULL AUTO_INCREMENT;

At this point, I decided to make a test and simply generate synthetic data, and so, when inserting after 150 million INSERT records, queries began to work out for an unrealistically long time or not to work out at all. I add data with requests of the form:
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);

For hardware: a regular home computer (amd 4 cores, 22 gigabytes of memory), ubuntu, mysql out of the box without any additional manipulations with it.
Interested in which direction to dig?

Answer the question

In order to leave comments, you need to log in

5 answer(s)
A
AlexHell, 2018-06-03
@AlexHell

Here they write about "not the best solution" and for what tasks? The author tell me how you plan to use! To read a lot of requests? What is the read/write ratio? 80 reads, 20 writes? Then indexes not to delete precisely. Yes, and generally bad advice in the spirit of "insert batch without indexes" .. yeah, and then wait until the indexes are built on these millions? And if a person has permanent records, that is, it is not 100% of readings, why should he delete indexes every time, insert data, and create indexes again? You can delete indexes once, for a huge package and in the absence of subsequent inserts for a long time, otherwise recreating and deleting is not an option.
Further, they advised me to change to InnoDB - I fully support it. MyISAM is very finicky and can easily be corrupted (I can’t find a proof now), and it is recommended that large databases (and especially important ones, not even large ones, but especially large ones) be kept in InnoDB or xtraDB (MariaDB is an improved version of InnoDB). There recovery after failures adequate. As for the speed of work, it is necessary to carry out specific measurements for your load (reads, writes, your hardware) in order to find the moment at which MyISAM will be faster, which is not a fact. And disaster recovery is expensive.
As for the general approach: keep indexes in RAM, although Mysql does this itself when you allocate enough RAM. in MyISAM options, google for the job (if you stay on it). And for InnoDB, you need to set the following parameters
innodb_buffer_pool_size=1024M
innodb_log_buffer_size=4M
innodb_log_file_size=48M
innodb_log_files_in_group = 2
there are entire articles and books on how to set them up (find the original from Zaitsev if you need details). From myself I will say that innodb_buffer_pool_size is the main option for keeping everything in RAM, if indexes, data do not fit, i.e. according to the measurements, swapping to the disk is in progress, see read / write by disk .. under linux iostat -dx 5 ; vmstat 5 ; iotop to help
innodb_log_buffer_size and innodb_log_file_size is set from the size of the inserts so that they do not accumulate too much or too little in the RAM - it affects the dumping of the log to disk, read the details and adjust according to your write load, no one will tell you the exact numbers (the setting rule is in the book and articles).
innodb_flush_log_at_trx_commit - additional option, read what it does, it can be useful, but default is better for reliability.
If there is sufficient iron i.e. RAM and disks in raid 10, then InnoDB (xtraDB) will provide you with an adequate insert into a 500 million table with your simple structure. And reading from it will provide.
ps Another person suggested "it is necessary to do sharding and keep the data in different databases" - I partially support it. Those. this can be a solution if you can keep different shards on several physical disks (or even servers, but it is possible on different disks of one server) (in a simple way - parts (not copies) of your table area). But this is if you have a lot of records. Sharding is designed to speed up recording due to parallelization (on disks, servers).
Although according to your database, I don’t see where it’s very often necessary to change something right here. Are new users added frequently? The data changes which fields and how often? Maybe not the whole table was brought and there is something else?

D
Denis, 2018-06-02
@sidni

remove the keys from the fields

D
d-stream, 2018-06-02
@d-stream

1. bulk insert
2. with guaranteed data correctness - it can help a little in the beginning to infuse data into a table without indexes, and only then alter with setting indexes

V
vanyamba-electronics, 2018-06-02
@vanyamba-electronics

I would try to fill the entire table with empty records, and upon registration I would modify them with an UPDATE query. This is so that indexes are not rebuilt each time, and the file is not copied when new records are added.
The table itself should look like:

CREATE TABLE `users` (
  `id` int(11) PRIMARY KEY AUTOINCREMENT,
  `group_id` int(11) DEFAULT NULL,
  `email_id` int(11) DEFAULT NULL,
  `reg_time` timestamp NULL DEFAULT NULL
);

With a table of e-mails, the scheme is approximately the same, only varchar(255) must be replaced with char(63) or char(127) so that a certain size space is allocated for the text in the table file.

D
Dimonchik, 2018-06-02
@dimonchik2013

have fun
too lazy to re-read, but there is nothing faster than inserting load data infile into myisam without indexes (then indexes will also take forever to be done, usually in this place it will be converted to InnoDB before indexing)
if the article is wrong - I will be grateful
PS for 500 million muscle is not the best solution, but here it depends on the task and qualifications

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question