How to quickly sort in a large table by a frequently changed field?

A

Alexander Kubintsev2020-01-20 10:25:39

PostgreSQL

Alexander Kubintsev, 2020-01-20 10:25:39

It is well known that it is convenient to sort by an indexed field.
However, it is also known that the use of indexes on frequently changed data leads to a large performance degradation.
The situation is this.
There are shards with tables from game accounts, where about 50 million records live. Among the different fields there is also a field with the balance of the game currency. This balance changes accordingly.
There is a background task with regular rating/top building based on balance data every couple of hours. The request includes, of course, the "ORDER BY balance DESC" construct. The interval is 2 hours - because the construction takes more than an hour. I would like it to be as often as possible.
Problem: a very serious load is created on the disk subsystem of the database servers, IOLA is about 90%, it reaches almost 100%.
Possible and undesirable solutions:
1) Make a database replica on another server and torture it with queries. But you need to make a reservation for it. It is also possible to reduce the refresh rate, tk. iron will be less powerful.
2) Delivering an even more productive SSD array is expensive
3) Use some kind of NoSQL solution specifically for this task. Add redundancy and probably inconsistency. Again, you need to make a reservation.
I would like to solve the problem more elegantly, at the software level with the existing infrastructure.
Could using MATERIALIZED VIEW help? Let's make a separate presentation of the necessary data to build a rating and put an index on the balance field, and let's update it once every half an hour?

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

D

Dmitry Sviridov, 2020-01-20
@akubintsev

Is there an option to store user IDs and their balance values in redis? For example, using Sorted sets. Whenever the balance of each user changes, update the values. That is, pulling the top out of the radish, in theory, will not be a problem. And you can pull as often as you like, and the data is always up to date.

S

Sergey Eremin, 2020-01-20
@Sergei_Erjemin

We need to see how often the players look at the rating and try to decide organizationally. For example, untie the rating from the points in the game and enter the "magic" progress parameter (the rate of scoring) . Then, in fact, it is already impossible for the user to check whether you accurately calculated the place in the ranking or not. In addition, it can be assumed that those who are from 1st to 1000th place are more interested in their rating than those from 100,000th to 500,000th ... And then there is scope and expanse, how everything can be reorganized. For example, divide users into groups - Gurus, Pros, Rookies, Microorganisms (and you can divide by the frequency of access to the rating) - each group has its own table (it can even be in the database on a separate node), and add a pointer to which user table he belongs to the group...
There can be many options for such an organizational rework to reduce the load. Maybe I'm wrong, but it doesn't always make sense to solve a technical problem head-on.

M

mayton2019, 2020-01-23
@mayton2019

Why would you sort all 50 million? The task of the top is to take, for example, top 10.
Make yourself a temporary table and, according to the trigger, merge into it according to the paretto rule or more than 95%
where the balance is greater than X. And there will be not 50 million, but 100 thousand.
And this small table will be easily sorted and published.