How to optimize SQL query for random selection?

N

NRO2014-10-22 08:18:21

PHP

NRO, 2014-10-22 08:18:21

Here is what we have: SELECT * FROM dump WHERE text LIKE 'some text' ORDER BY RAND() LIMIT 10
The problem is this: there are 600k records in the table at the moment, the query takes an average of 2-4 seconds, and the table is replenished, and in the future it will be 40 times more (which, logically, will proportionally increase the request time and resources for calculation). As a result, the server will light up. Let me rephrase the task: I need to select 10 random records that match the pattern in the most optimal way. Help me please.
I read the following on Habré (the example was without LIKE): In the process of executing this query, MySQL writes all (!!!) rows of the source table to the temporary table, with one new field in which the results of the RAND () function are written - i.e. a set of arbitrary values. This temporary table is then filesorted by the added field with arbitrary values, and then the first 10 records are selected.

Reply

Answer the question

In order to leave comments, you need to log in

4 answer(s)

D

Dmitry Entelis, 2014-10-22
@DmitriyEntelis

You have 2 tasks: the task is to find occurrences of a string, and the task is to select random 10 from the results.
These are 2 different tasks :) It
is logical to solve the first task using search tools, such as Sphinx. sphinxsearch.com
It will quickly return you the id of matching entries.
Of these, using PHP, we select as many random ones as necessary, then a query in SQL select ... where id in (1,2,3,4...)
. In your case, this will be the most productive and scalable solution. PS The very idea of displaying 10 random records corresponding to the pattern
is not very clear to me . Wouldn't it be better to display the 10 most relevant ones? :) Or do you have a too general pattern and there is no question of relevance? Please clarify this point.

M

My joy, 2014-10-22
@t-alexashka

www.warpconduit.net/2011/03/23/selecting-a-random-...
here are 3 examples of fast rand() selections

W

whats, 2014-10-22
@whats

To kazmiruk's comment,
in this case, a subquery is suitable for you in which you select all LIKE records.
But since mysql is disabled in everything, then next you need to apply the ROW_NUMBER () simulation
Thus, you select all records that match your condition and give them new IDs that will be without holes. The request is very fast.
Further, you can make your request in the main query, which was in the example with RAND (). But keep in mind that LIKE is a slow search function, and in the case of %text% it slows down a lot. Again, mysql is a crappy database for anything other than regular index reads and writes. It's better not to do that. The same postgres with its full-text search will do everything at the sphinx level, in some cases even faster.

U

ugodrus, 2014-10-22
@ugodrus

If the number of conditional results is large enough, then a lot of random numbers are calculated for these fields accordingly. And I personally never use LIKE - a very stupid method .. and when it works with large volumes - it's generally dark. Try to remake it under REGEX and if the set of selected data is always large enough - it's better to remake it differently. I advise you to look here .