Who tested the IN operator in MySQL? How fast is it and are there alternatives?

Gambik2012-08-07 00:24:51

MySQL

Gambik, 2012-08-07 00:24:51

When selecting from a table, it is necessary to get records corresponding to all values of a rather large array.
It turns out that about 1000 values separated by commas are driven into the IN operator. I feel something is wrong here)
What would you advise better in terms of performance?

Answer the question

In order to leave comments, you need to log in

11 answer(s)

edogs, 2012-08-07
@Gambik

Approach the problem from a slightly earlier point.
Where does the array come from? From base? Then maybe it makes more sense to think about a complex query?
Or you can drive an array into a temporary table and refer to it.

Iliapan, 2012-08-07
@Iliapan

I categorically do not advise using IN, it starts to shamelessly slow down ... This is a crutch only for isolated cases.

Nail, 2012-08-07
@Nail

Avoid using IN(...) when selecting on indexed field s
I think this is only for composite indexes: fields is plural.
It is described in more detail here:
www.mysqldiary.com/optimizing-the-mysql-in-comparison-operations-which-include-the-indexed-field/
If you do ID IN (...) - there is nothing wrong with that.

Vampiro, 2012-08-07
@Vampiro

In general, queries are extremely difficult to optimize without having anything in front of your eyes .
Show create table, explain extended ... Well, okay, they grumbled, and it will be.
If you use simple numeric values in IN and the query itself is as simple as a bullet, for example

select * from test1 where ID in (1,2,5,3);

That can be left as is. As far as I know, the muscle will sort this list in any way, so that later it can be searched for. And even indexes will be used for this.
If the queries are more complicated (join, order by, union), then sadness awaits you, and a more detailed description of the task awaits us. It is often faster to load all IDs into a temporary table and then work with it. Than to wait until the request with the replacement of the usual condition with IN will work.

strib, 2012-08-07
@strib

Where does the array come from?
Will it be faster to drive it into a temporary table and use EXISTS?
If the array is in the database, then definitely EXISTS.

Zerstoren, 2012-08-07
@Zerstoren

In case of need IN, I add LIMIT with DISTINCT. In addition, if IN comes as a list, then you can immediately find out the length of the list for LIMIT.
But it will take a long time to select rows, numbers are better, and then with indices.
Actually a comparison of the two options, with a cold start.
The database contains 4 million records.
SELECT DISTINCT * FROM `map` WHERE id IN ( 4, 5291, 12356, 256783, 1234, 1654, 57572 ) LIMIT 7
~0.0008 - sec
SELECT * FROM `map` WHERE id IN(4,5291,12356,256783,1234 .1654.57572)
~0.0012 - sec

Sergey, 2012-08-07
@Ualde

The wiki got wiped somewhere, but a copy of .

Avoid using IN(...) when selecting on indexed fields, It will kill the performance of SELECT query.

Actually, can you still have the opportunity to use ranges for searching? Or LIKE?

keltanas, 2012-08-07
@keltanas

If you really need to select 1000 records by their ID, can you drive them into Redis and pull from there? Most likely it will be faster, and the muscle will not hang. If there is a lot of data, they can be sharded across several Redis.
Try to analyze the keys for occurrences in continuous ranges, as advised by Ualde. For example: records with keys 1, 2, 3, 5, 6, 7 can be obtained as a condition from 2 ranges.

EugeneOZ, 2012-08-07
@EugeneOZ

There was a similar task, only UPDATE was necessary. I did not detect the numbers, but IN worked slowly (and the length of the request is unpleasant), so I did prepare and did execute in the loop. It worked instantly, unlike IN.

Gambik, 2012-08-07
@Gambik

Thank you all for the points of thought!
I will think and try. Most likely I will review the architecture or I will use a temporary table.
If more ideas come up, I'll gladly absorb them! Thank you!

Nail, 2012-08-07
@Nail

Here some surprising examples are given, I decided to check.
Version 5.5.25a-27.1-log Percona Server
The table has 26 million rows, the disk size is 4.5G.

FLUSH STATUS; select * from table where id in (1000,100000,1000000,3000000,5000000,7000000,10000000); SHOW SESSION STATUS LIKE 'Handler_read%'; 

+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 7     |
| Handler_read_last     | 0     |
| Handler_read_next     | 0     |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 0     |
+-----------------------+-------+
7 rows in set (0.00 sec)

FLUSH STATUS; select * from table where id in (1000,100000,1000000,3000000,5000000,7000000,10000000) limit 4; SHOW SESSION STATUS LIKE 'Handler_read%'; 

+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 4     |
| Handler_read_last     | 0     |
| Handler_read_next     | 0     |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 0     |
+-----------------------+-------+
7 rows in set (0.00 sec)

The queries themselves are executed in 0.00 sec
Conclusion:
Check indexes and statistics, upgrade mysql.