How to speed up a Postgresql query?

D

Dannerty2018-06-28 08:28:48

linux

Dannerty, 2018-06-28 08:28:48

Can you tell me how to speed up the request?
The initial query was optimized by about 2 times, but it still takes too long to process:
Indexes:
"i_table_1_primarykey" btree (primarykey)
"i_table_1_starttime" btree (starttime)
"i_table_2_session" btree (session)
Fields session and primarykey in uuid format, starttime - date and time
SELECT "t1".* FROM table_1 as t1, (select session from table_2 group by session) as t2 WHERE t1.primarykey=t2.session ORDER BY "starttime" DESC LIMIT 20;
EXPLAIN ANALYZE

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=516423.42..516423.47 rows=20 width=64) (actual time=6916.209..6916.216 rows=20 loops=1)
   ->  Sort  (cost=516423.42..516602.29 rows=71549 width=64) (actual time=6916.208..6916.213 rows=20 loops=1)
         Sort Key: t1.starttime DESC
         Sort Method: top-N heapsort  Memory: 27kB
         ->  Hash Join  (cost=217922.29..514519.53 rows=71549 width=64) (actual time=2212.324..6773.452 rows=823467 loops=1)
               Hash Cond: (table_2.session = t1.primarykey)
               ->  Group  (cost=0.43..243099.63 rows=71549 width=16) (actual time=1.119..3483.992 rows=823468 loops=1)
                     Group Key: table_2.session
                     ->  Index Only Scan using i_table_2_session on table_2(cost=0.43..227046.60 rows=6421212 width=16) (actual time=1.117..2939.527 rows=6228279 loops=1)
                           Heap Fetches: 517761
               ->  Hash  (cost=108137.05..108137.05 rows=4723505 width=64) (actual time=2205.657..2205.657 rows=4523121 loops=1)
                     Buckets: 131072  Batches: 64  Memory Usage: 7475kB
                     ->  Seq Scan on table_1 t1  (cost=0.00..108137.05 rows=4723505 width=64) (actual time=0.791..1125.886 rows=4523121 loops=1)
 Planning time: 1.232 ms
 Execution time: 6916.313 ms
(15 строк)

Shared_buffers = 2GB
work_mem tried to change the values, but it didn't play much of a role. I
suspect that the problem is in the HDD, and it would be necessary to transfer the server to an SSD, but perhaps there is some other option.

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

M

Melkij, 2018-06-28
@Dannerty

Need explain (analyze,buffers), highly desirable with track_io_timing = on in the config.
Renders. If the scheduler decides that work_mem is not enough, there will be no hash join. For sorting 27kb, obviously, it doesn't matter.
However, why would you want to join a subquery at all, where the grouping of the subquery alone takes half the response time? You need a simple exists.

SELECT "t1".* FROM table_1 as t1
where exists (select 1 from table_2 as t2 WHERE t1.primarykey=t2.session)
ORDER BY "starttime" DESC LIMIT 20;

And then know your data. If by starttime DESC the necessary exists is quickly found, it will be good. If exists is small, it is worth considering whether to denormalize this feature in table_1 with a trigger for consistency in table_2 and a partial index on starttime where t2_exists.
Not a very good idea. It is also much slower when comparing relative to bigint (especially if it is a varchar rather than uuid data type) and due to random distribution, it somewhat confuses the scheduler statistics.

M

Max, 2018-06-28
@MaxDukov

try replacing it with the same effect, but...