A
A
Alexander2017-01-18 17:13:20
MySQL
Alexander, 2017-01-18 17:13:20

How to compare two huge tables (3 and 2 million rows) and update one of them?

Good day! There is a main table with products for 4 million rows (6 GB) and a second table with an updated assortment (~ 3 million rows, 4 GB). Table type - InnoDB. The set of table fields is the same.
Main product table:

CREATE TABLE `items` (
 `aid` int(11) AUTO_INCREMENT,
 `id` varchar(100),
 `id_shop` tinyint(4),
 `name` text,
 `description` text,
 `enabled` tinyint(1),
... еще 20 полей ...
 PRIMARY KEY (`aid`),
 KEY `se` (`id_shop`,`enabled`),
... еще много индексов ...
) ENGINE=InnoDB AUTO_INCREMENT=43657573 DEFAULT CHARSET=utf8

You need to compare them by matching the values ​​of 2 fields (id and id_shop):
1. If there is no such product, add it to the main table.
2. If there is a product, update some of its fields in the main table.
At first I decided to try a "bad" solution: loop through the second table in PHP, during each iteration look for matches with the old table, and then do an INSERT or UPDATE. This solution takes a huge amount of time ~10 hours.
After that I came across "INSERT ON DUPLICATE KEY UPDATE".
INSERT INTO items (id, id_shop, name) VALUES ('1', '2', '3') 
ON DUPLICATE KEY 
UPDATE name = 'new_name', description = 'new description'

But got confused which PRIMARY KEY should be used in such a query.
1. Need to remove the PRIMARY KEY from 'aid' and create a key instead: PRIMARY KEY (`id`, `id_shop`)?
2. Whether it is possible to solve my task in one this request? Or do you still have to loop through each row of the updated table and do "INSERT ON DUPLICATE KEY UPDATE"?
Can you please tell me how to use this construction in my case?
Thank you!

Answer the question

In order to leave comments, you need to log in

2 answer(s)
R
Rsa97, 2017-01-18
@librown

Need UNIQUE KEY (`id`, `id_shop`) The
request will be something like this:

INSERT INTO `items` (`id`, `id_shop`, `name`, `description`, ...)
  SELECT `id`, `id_shop`, `name` AS `new_name`, `description` AS `new_description`, ...
    FROM `new_items` 
  ON DUPLICATE KEY UPDATE `name` = `new_name`, `description` = `new_description`, ...

Before such a large query, it makes sense to delete the rest of the keys, leaving only PRIMARY KEY and UNIQUE KEY, after the query, recreate them.

R
Roman Fov, 2017-01-18
@Roman-Fov

A update can be done like this:

UPDATE `items`, `items_new` SET `items`.`name` = `items_new`.`name` WHERE `items`.`aid` = `items_new`.`aid` AND `items`.`shop_id` = `items_new`.`shop_id`

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question