V
V
vladimirchelyabinskiy2014-07-28 21:26:14
MySQL
vladimirchelyabinskiy, 2014-07-28 21:26:14

MySQL can't handle large amount of data, how to solve the problem?

Hello everyone, I have been looking for an answer to my question on MySQL for a very long time, maybe I can find it here .... Let's
start
I have large data files on my hard drive, I need to insert this data into a MySQL database with the MyISAM table
type data 16, 2 columns in each table, for each record, an MD5 hash is generated; the original row and its MD5 are written to the table.
I don't know how to do it so I wrote a program in C#
Ask why and why INSERT if you can do it with LOAD DATA INFILE and MD5() ?
Because I don’t know how to make a query using SQL to first convert the string to md5 and then insert it into the desired table by its first character (0123456789ABCDEF): (MD50, MD51, MD5A, MD5F).
Approximate content of the file: 1.txt


6789876154321987654245667563457
3468679765675423222786658115645
0364345444442355765896679674563
_

There can be from 100 to 500,000,000 such lines in a file, or even more.
my program inserts into the database very quickly using the INSERT method
BUT! with each new insert, the speed drops a little and by 50,000,000 it drops to almost 0.
I turn off the server and start it again and the records are not inserted at all, then I do a REPAIR of all tables then OPTIMIZE after that the insert resumes but is terribly slow.
I even implemented these functions in my program, but it didn't help.
I am using MySQL database version 5.1.73-winx64
Modest data of my server:

Processor: Core i5.
Memory: 8 GB
Hard disk: SATA3 6 TB Hitachi He6 7200rpm (5 pieces - 30 terabytes).

Structure of 16 tables:

CREATE TABLE `MD50` (
`ORIGINAL` CHAR(32),
`MD5Hash` CHAR(32),
PRIMARY KEY (MD5Hash)
) TYPE=MyISAM DELAY_KEY_WRITE=1;

My settings from my.cnf file

[mysqld]
port = 3306
socket = MySQL
skip-locking
skip-innodb
key_buffer_size = 4000M
table_open_cache = 1024
sort_buffer_size = 256K
read_buffer_size = 256K
read_rnd_buffer_size = 256K
max_allowed_packet = 1M
myisam_sort_buffer_size = 512M
myisam_recover = force, backup
query_cache_size = 128M
query_cache_type = 1
thread_concurrency = 8
tmp_table_size=64M
max_heap_table_size=64M
thread_cache_size = 256
server-id = 1
[mysqldump]
quick
max_allowed_packet = 16M
[mysql]
no-auto-rehash
[myisamchk]
key_buffer_size = 256M
sort_buffer_size = 256M
read_buffer = 2M
write_buffer = 2M
[mysqlhotcopy]
interactive-timeout

Request code from my C# program

MySqlCommand insert_command = new MySqlCommand("insert ignore into md5" + md5_file_line[0] + " (original,
md5hash) values ​​('" + MySqlHelper.EscapeString(file_line) + "', '" + md5_file_line + "')", Settings .mysql_connect);
sql_query_execute = insert_command.ExecuteNonQuery();

Everything is fine in the program code; there can be no reason for it.

Answer the question

In order to leave comments, you need to log in

4 answer(s)
P
Puma Thailand, 2014-07-28
@vladimirchelyabinskiy

put mysql 5.6 there everything is still more cheerful and works much better with a large amount of data
discard the index, it is quite difficult to recalculate for md5

M
mrstrictly, 2014-07-28
@mrstrictly

Read about LOAD DATA: dev.mysql.com/doc/refman/5.1/en/load-data.html

V
vdem, 2014-07-28
@vdem

I just glanced at the task, try this: 1) LOAD_DATA_INFILE into a table without any PRIMARY KEY there (but declare the field and fill it with an MD5 hash), 2) ALTER TABLE, set PRIMARY KEY. Since this is a SINGLE operation (as far as I understand), there is no point in steaming My(!!)SQL with reindexing after each insertion.

V
vladimirchelyabinskiy, 2014-07-28
@vladimirchelyabinskiy

Well, on the image you can see that there are already differences in time, that is, the speed drops a little.... mysql 5.6.19 x64 the same config.
dbc5f09e415f4ea38741b0f328f3fe35.jpg

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question