Selects from a table with 5 million rows. If you break it into 100 tables, will there be a performance gain?

Alexander2017-07-20 20:11:54

MySQL

Alexander, 2017-07-20 20:11:54

Hello colleagues!
Advise, there is a table with goods (InnoDB for 5 million rows). Indexes added. But the server only has 4GB of memory.
You have to make a lot of requests to it with joins of other large tables (online store filter: by manufacturer, color; sorting by price, by date). Such selects are executed sometimes up to 30 seconds.
Does it make sense to scatter products into 50-100 tables (each category has its own table)? The result will be tables of 30-100 thousand rows, no more. How can performance be drastically improved?
Or, nevertheless, the most correct option is to add up to 16 gigabytes of memory and enjoy the result? :)
Thanks!

Answer the question

In order to leave comments, you need to log in

14 answer(s)

ThunderCat, 2017-07-21
@ThunderCat

I support longclaps , are you an aliexpress admin? where in the table of goods 5M rows, do you take into account matches by the piece? This is the number of times, the second - if the filters are so hackneyed - do 2-3 queries through the back, what do you have there, puff, node, janga, wisely limit the number of goods in the first selection by the fastest index and the most limiting selection, for example, all goods of the selected category, then filter only on the result of the selection by type id in(1,2,3,4). And yet, tell me, don’t torture me, WHERE ARE SO MANY GOODS FROM, I won’t fall asleep now ...

laxikodeje, 2017-07-20
@laxikodeje

5 million records for modern DBMS is nonsense.
It would be 5 billion - you could bother as much as you want.
The fact is that you have not properly organized the storage of filters.
This is a really simple method, but SQL in normalized form is not suitable for filtering products by product category and online store.
Need or DENORMALIZED form with duplication.
Or in general a DB of other type. I personally prefer Tarantool for such a filtering task, but I suppose Solr is also suitable
Nope.
There will be no sense.
In addition, do not forget - modern requirements are such that it would be very good if the filter worked out at all instantly , almost until the user ticked the next checkbox - the product according to the previous filter was already loaded from the server into the browser.

longclaps, 2017-07-20
@longclaps

What does a table with 5 million rows do there? Attributing the fuck to the archive table everything that has been overgrown for 10 years like moss is a meaningful action. I'm sure 90% of shit will fly away according to the mildest criteria.
100 actual tables is much more hell than what is.

Dmitry Dart, 2017-07-20
@gobananas

No, you will not get a win, because in MySQL there is still some time to open the table, and there is additional. logic will have to be written with conditions from which table to select. You need to denormalize the data at least a little, do an EXPLAIN, see if MySQL goes crazy and doesn't use the right indexes.
5 million is not very much, I'm sure everything can be brought back to normal.
PS I myself worked with a table of 3.5 million records on a server with 2 GB of memory, everything is fine, there are no requests longer than 0.1 seconds, although this is too much IMHO.

landergate, 2017-07-21
@landergate

There will be no growth.
But let's look at the problem from a different angle:
5k lines is an insignificant amount. If you have selects for 30 seconds, then either you do not have the correct indexes on the columns that you select, or you are doing LIKEs starting with %. Indexes are not used in LIKE %...%. Only when LIKE ...%
Another possible reason is storage. See if everything at this moment rests on the disks. If it turns out that you have a high iowait at the time of heavy requests, try migrating to the site with an SSD.

Vasily Nazarov, 2017-07-21
@vnaz

1. Perhaps it is more correct to use noSQL here.
2. Even if SQL (= RDBMS) - you need to get rid of JOINs.
But! ! The level of JOINs =1 can be quite acceptable if you
- when filtering on the front, immediately select the id of related entities (brand, color, etc.)
- correctly set up the indexes and make the right queries
For example, with at least one active In the filter, you do 1 SELECT, and therefore already in PHP (or whatever you have) you filter by other parameters.
And yes, I hope that texts and even more pictures (they are not in the database?) are not selected by the same query?
If suddenly selected (like SELECT * ..), replace "*" with "field1, field2" (only the ones you need), it may turn out to be a pleasant surprise

Astrohas, 2017-07-20
@Astrohas

But the server only has 4GB of memory.

He also worked with an ANALOGIC server, and a 90GB database!
How will you access the category tables? Usually frankenstein markers make an auxiliary table like "category - category_tabe_name". But keep in mind that to remove any product from the database, you will have to make at least two requests.
You also need to take into account filters for several categories, and for example 2 categories there will be two queries for querying tables, and two for querying information from tables + one more for results.
It is much easier to buy additional memory, adjust caching, add additional indexes and all sorts of things.
----
and if you still want to divide by tables, I advise you to use standard muscle tools like "partitioning" which is especially useful, and there is a manual for this on Habré https://habrahabr.ru/post/66151/

chupasaurus, 2017-07-21
@chupasaurus

clickhouse. G-like indexes, column-based, that's it.

Mountaineer, 2017-07-27
@Mountaineer

In short, you have three options:
1. optimize the query
2. scale up the database (better server)
3. scale out the database (sharding)
What you want is the anti-pattern: https://stackoverflow.com/questions/ 16721772/mysql...

Dimonchik, 2017-07-20
@dimonchik2013

memory yes, but for JOIN percent more important, better profile queries

al_gon, 2017-07-20
@al_gon

You have to make a lot of requests to it with joins of other large tables (online store filter: by manufacturer, color; sorting by price, by date). Such selects are executed sometimes up to 30 seconds.

SQL doesn't fit here.
How is a set of filters implemented for different categories by product characteristics?
Faceted Search with Solr

Fortop, 2017-07-21
@Fortop

In such a formulation of the question, the most correct option is to estimate the cost of each of the solutions for a certain period (years, two, ten) depending on the speed of changes in the project and its environment.
And choose the one that is more cost effective.

akzhan, 2017-07-27
@akzhan

This rule of thumb will help you - increasing the size of the table by 10 times slows down the search in the index by no more than twice. Increasing the table by 100 times reduces the search by no more than 4 times.
In fact, this is completely inaccurate, we do not take into account the placement of intermediate results of different connections on disk, and so on. But the main thing - here it becomes obvious that table partitioning is more to your detriment than good.
What I would recommend to you is to simply subtract EXPLAIN.
Then just change MySQL to Postgres.
And only after that do denormalization, NoSQL etc.

Dmitry, 2017-07-28
@DimonSmart

The most important thing in any database is structural optimization. Perhaps it is worth redesigning the table and looking towards classical normalization. I completely agree with those who doubt very much the nomenclature of 5 million records.
Classic is one name in the main table and colors, sizes, etc. in another. And in the price list and stock - combinations. By the way, with the classical approach, your filters naturally decompose into different tables.