W
W
whats2014-04-21 19:34:48
PostgreSQL
whats, 2014-04-21 19:34:48

How to speed up ORDER BY in Postgres?

Greetings. Tell me, please, there is such a situation.
Directory tree table.
e20b3a1fb3fc20782168cfdaa3f7d2b536603606
Correspondence table
f2e687765aa089f2719ce10fbb043721aeff1fc4
There is a query that builds this tree in tabular form

SELECT DISTINCT t0.groupname, t1.groupname,  t2.groupname
FROM grouptree as t0 
left join models as m on m.id = t0.idmodel 
left join grouptree as t1 on t1.parent = t0.groupno AND t1.idmodel = t0.idmodel
left join grouptree as t2 on t2.parent = t1.groupno AND t2.idmodel = t1.idmodel
where t0.parent = 0 AND m.typeauto = 0

In the conditions, we select the first level of the tree and the relation to the type of cars.
Here is the query plan
"HashAggregate  (cost=144714.35..145034.98 rows=32063 width=117) (actual time=1209.785..1211.691 rows=10716 loops=1)"
"  ->  Nested Loop Left Join  (cost=2379.07..144473.88 rows=32063 width=117) (actual time=28.075..840.266 rows=1391881 loops=1)"
"        ->  Nested Loop Left Join  (cost=2378.64..128016.41 rows=32063 width=86) (actual time=28.066..171.952 rows=264099 loops=1)"
"              ->  Hash Join  (cost=2378.21..25071.47 rows=32063 width=47) (actual time=28.057..69.569 rows=30423 loops=1)"
"                    Hash Cond: (t0.idmodel = m.id)"
"                    ->  Bitmap Heap Scan on grouptree t0  (cost=660.19..22638.32 rows=35066 width=47) (actual time=20.854..35.009 rows=34960 loops=1)"
"                          Recheck Cond: (parent = 0)"
"                          ->  Bitmap Index Scan on "2-parent-to-idmodel"  (cost=0.00..651.42 rows=35066 width=0) (actual time=15.255..15.255 rows=34960 loops=1)"
"                                Index Cond: (parent = 0)"
"                    ->  Hash  (cost=1346.41..1346.41 rows=29729 width=4) (actual time=7.171..7.171 rows=29715 loops=1)"
"                          Buckets: 4096  Batches: 1  Memory Usage: 1045kB"
"                          ->  Seq Scan on models m  (cost=0.00..1346.41 rows=29729 width=4) (actual time=0.004..4.488 rows=29715 loops=1)"
"                                Filter: (typeauto = 0)"
"                                Rows Removed by Filter: 2798"
"              ->  Index Scan using "2-parent-to-idmodel" on grouptree t1  (cost=0.43..3.20 rows=1 width=51) (actual time=0.001..0.002 rows=9 loops=30423)"
"                    Index Cond: ((parent = t0.groupno) AND (idmodel = t0.idmodel))"
"        ->  Index Scan using "2-parent-to-idmodel" on grouptree t2  (cost=0.43..0.50 rows=1 width=47) (actual time=0.001..0.002 rows=5 loops=264099)"
"              Index Cond: ((parent = t1.groupno) AND (idmodel = t1.idmodel))"
"Total runtime: 1212.145 ms"

Everything is good and works as it should. But I would like to sort the data.
I add only 1 line to the end
...
order by t0.groupname ASC

The picture is fundamentally changing. And the query execution time is increased by 20 times.
"Unique  (cost=146873.58..147194.21 rows=32063 width=117) (actual time=21456.087..21864.988 rows=10716 loops=1)"
"  Output: t0.groupname, t1.groupname, t2.groupname"
"  Buffers: shared hit=1240970"
"  ->  Sort  (cost=146873.58..146953.73 rows=32063 width=117) (actual time=21456.085..21594.183 rows=1391881 loops=1)"
"        Output: t0.groupname, t1.groupname, t2.groupname"
"        Sort Key: t0.groupname, t1.groupname, t2.groupname"
"        Sort Method: quicksort  Memory: 312756kB"
"        Buffers: shared hit=1240970"
"        ->  Nested Loop Left Join  (cost=2379.07..144473.88 rows=32063 width=117) (actual time=25.376..867.222 rows=1391881 loops=1)"
"              Output: t0.groupname, t1.groupname, t2.groupname"
"              Buffers: shared hit=1240970"
"              ->  Nested Loop Left Join  (cost=2378.64..128016.41 rows=32063 width=86) (actual time=25.366..172.208 rows=264099 loops=1)"
"                    Output: t0.groupname, t1.groupname, t1.idmodel, t1.groupno"
"                    Buffers: shared hit=162206"
"                    ->  Hash Join  (cost=2378.21..25071.47 rows=32063 width=47) (actual time=25.357..66.067 rows=30423 loops=1)"
"                          Output: t0.groupname, t0.idmodel, t0.groupno"
"                          Hash Cond: (t0.idmodel = m.id)"
"                          Buffers: shared hit=21731"
"                          ->  Bitmap Heap Scan on public.grouptree t0  (cost=660.19..22638.32 rows=35066 width=47) (actual time=17.457..31.343 rows=34960 loops=1)"
"                                Output: t0.idmodel, t0.groupno, t0.parent, t0.groupname, t0.groupnameen, t0.pictureindex, t0.mark, t0.sortorder"
"                                Recheck Cond: (t0.parent = 0)"
"                                Buffers: shared hit=20791"
"                                ->  Bitmap Index Scan on "2-parent-to-idmodel"  (cost=0.00..651.42 rows=35066 width=0) (actual time=14.128..14.128 rows=34960 loops=1)"
"                                      Index Cond: (t0.parent = 0)"
"                                      Buffers: shared hit=98"
"                          ->  Hash  (cost=1346.41..1346.41 rows=29729 width=4) (actual time=7.868..7.868 rows=29715 loops=1)"
"                                Output: m.id"
"                                Buckets: 4096  Batches: 1  Memory Usage: 1045kB"
"                                Buffers: shared hit=940"
"                                ->  Seq Scan on public.models m  (cost=0.00..1346.41 rows=29729 width=4) (actual time=0.003..5.048 rows=29715 loops=1)"
"                                      Output: m.id"
"                                      Filter: (m.typeauto = 0)"
"                                      Rows Removed by Filter: 2798"
"                                      Buffers: shared hit=940"
"                    ->  Index Scan using "2-parent-to-idmodel" on public.grouptree t1  (cost=0.43..3.20 rows=1 width=51) (actual time=0.001..0.002 rows=9 loops=30423)"
"                          Output: t1.idmodel, t1.groupno, t1.parent, t1.groupname, t1.groupnameen, t1.pictureindex, t1.mark, t1.sortorder"
"                          Index Cond: ((t1.parent = t0.groupno) AND (t1.idmodel = t0.idmodel))"
"                          Buffers: shared hit=140475"
"              ->  Index Scan using "2-parent-to-idmodel" on public.grouptree t2  (cost=0.43..0.50 rows=1 width=47) (actual time=0.001..0.002 rows=5 loops=264099)"
"                    Output: t2.idmodel, t2.groupno, t2.parent, t2.groupname, t2.groupnameen, t2.pictureindex, t2.mark, t2.sortorder"
"                    Index Cond: ((t2.parent = t1.groupno) AND (t2.idmodel = t1.idmodel))"
"                    Buffers: shared hit=1078764"
"Total runtime: 21879.380 ms"

It can be seen from the profiler that it gets stuck on sorting. Moreover, it sorts by 3 fields at once, although only 1 is indicated. Tell me, what am I doing wrong?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
W
whats, 2014-04-21
@whats

work_mem - 1GB Sorted in memory. What other parameters need to be named for informational content?
By the way, if you do this:

select * from (SELECT DISTINCT t0.groupname as s1, t1.groupname,  t2.groupname
FROM grouptree as t0 
left join models as m on m.id = t0.idmodel 
left join grouptree as t1 on t1.parent = t0.groupno AND t1.idmodel = t0.idmodel
left join grouptree as t2 on t2.parent = t1.groupno AND t2.idmodel = t1.idmodel
where t0.parent = 0 AND m.typeauto = 0) as t
order by t.s1 ASC

Which, I think, is fundamentally wrong, the request will complete in seconds.
"Sort  (cost=147755.31..147835.47 rows=32063 width=117) (actual time=1241.491..1241.930 rows=10716 loops=1)"
"  Output: t0.groupname, t1.groupname, t2.groupname"
"  Sort Key: t0.groupname"
"  Sort Method: quicksort  Memory: 2612kB"
"  Buffers: shared hit=1240970"
"  ->  HashAggregate  (cost=144714.35..145034.98 rows=32063 width=117) (actual time=1228.137..1229.846 rows=10716 loops=1)"
"        Output: t0.groupname, t1.groupname, t2.groupname"
"        Buffers: shared hit=1240970"
"        ->  Nested Loop Left Join  (cost=2379.07..144473.88 rows=32063 width=117) (actual time=29.937..852.961 rows=1391881 loops=1)"
"              Output: t0.groupname, t1.groupname, t2.groupname"
"              Buffers: shared hit=1240970"
"              ->  Nested Loop Left Join  (cost=2378.64..128016.41 rows=32063 width=86) (actual time=29.927..172.365 rows=264099 loops=1)"
"                    Output: t0.groupname, t1.groupname, t1.idmodel, t1.groupno"
"                    Buffers: shared hit=162206"
"                    ->  Hash Join  (cost=2378.21..25071.47 rows=32063 width=47) (actual time=29.915..67.004 rows=30423 loops=1)"
"                          Output: t0.groupname, t0.idmodel, t0.groupno"
"                          Hash Cond: (t0.idmodel = m.id)"
"                          Buffers: shared hit=21731"
"                          ->  Bitmap Heap Scan on public.grouptree t0  (cost=660.19..22638.32 rows=35066 width=47) (actual time=22.059..35.273 rows=34960 loops=1)"
"                                Output: t0.idmodel, t0.groupno, t0.parent, t0.groupname, t0.groupnameen, t0.pictureindex, t0.mark, t0.sortorder"
"                                Recheck Cond: (t0.parent = 0)"
"                                Buffers: shared hit=20791"
"                                ->  Bitmap Index Scan on "2-parent-to-idmodel"  (cost=0.00..651.42 rows=35066 width=0) (actual time=14.175..14.175 rows=34960 loops=1)"
"                                      Index Cond: (t0.parent = 0)"
"                                      Buffers: shared hit=98"
"                          ->  Hash  (cost=1346.41..1346.41 rows=29729 width=4) (actual time=7.824..7.824 rows=29715 loops=1)"
"                                Output: m.id"
"                                Buckets: 4096  Batches: 1  Memory Usage: 1045kB"
"                                Buffers: shared hit=940"
"                                ->  Seq Scan on public.models m  (cost=0.00..1346.41 rows=29729 width=4) (actual time=0.003..5.017 rows=29715 loops=1)"
"                                      Output: m.id"
"                                      Filter: (m.typeauto = 0)"
"                                      Rows Removed by Filter: 2798"
"                                      Buffers: shared hit=940"
"                    ->  Index Scan using "2-parent-to-idmodel" on public.grouptree t1  (cost=0.43..3.20 rows=1 width=51) (actual time=0.001..0.002 rows=9 loops=30423)"
"                          Output: t1.idmodel, t1.groupno, t1.parent, t1.groupname, t1.groupnameen, t1.pictureindex, t1.mark, t1.sortorder"
"                          Index Cond: ((t1.parent = t0.groupno) AND (t1.idmodel = t0.idmodel))"
"                          Buffers: shared hit=140475"
"              ->  Index Scan using "2-parent-to-idmodel" on public.grouptree t2  (cost=0.43..0.50 rows=1 width=47) (actual time=0.001..0.002 rows=5 loops=264099)"
"                    Output: t2.idmodel, t2.groupno, t2.parent, t2.groupname, t2.groupnameen, t2.pictureindex, t2.mark, t2.sortorder"
"                    Index Cond: ((t2.parent = t1.groupno) AND (t2.idmodel = t1.idmodel))"
"                    Buffers: shared hit=1078764"
"Total runtime: 1242.392 ms"

How to sort correctly?

A
Alexey Lesovsky, 2014-04-25
@lesovsky

here is the answer from more experienced colleagues (I'm still learning myself))):
specifically, that query is not accelerated in any way ... the problem with order by can be removed through WITH or offset 0 in the subquery and only after imposing sorting.
those. there the very statement of the problem with 1.4M lines in sorting and 10k lines in the output is not working
, something like this, completely bleak ((

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question