Why is DISTINCT ON so slow?

U

un1t2017-04-12 14:41:52

PostgreSQL

un1t, 2017-04-12 14:41:52

Plate 931,263 entries.
Request

select distinct on (o.group_id) o.id group_id from offers o limit 10;

takes more than half a second. And there is an index on group_id.
Table structure:

# \d+ offers;
                                                        Table "public.offers"
    Column    |          Type           |                      Modifiers                      | Storage  | Stats target | Description 
--------------+-------------------------+-----------------------------------------------------+----------+--------------+-------------
 id           | integer                 | not null default nextval('offers_id_seq'::regclass) | plain    |              | 
 name         | character varying(400)  | not null                                            | extended |              | 
 group_id     | integer                 |                                                     | plain    |              | 
Indexes:
    "offers_pkey" PRIMARY KEY, btree (id)
    "offers_group_id_e0c51f8a" btree (group_id)

EXPLAIN ANALYZE

# explain analyze select distinct on (o.group_id) o.id group_id from offers o limit 10;
-[ RECORD 1 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN | Limit  (cost=0.42..59572.55 rows=10 width=8) (actual time=0.089..566.879 rows=1 loops=1)
-[ RECORD 2 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |   ->  Unique  (cost=0.42..1191442.91 rows=200 width=8) (actual time=0.087..566.876 rows=1 loops=1)
-[ RECORD 3 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |         ->  Index Scan using offers_group_id_e0c51f8a on offers o  (cost=0.42..1189221.41 rows=888599 width=8) (actual time=0.085..529.775 rows=931263 loops=1)
-[ RECORD 4 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN | Planning time: 0.137 ms
-[ RECORD 5 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN | Execution time: 566.925 ms

How can you speed up?

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

M

Melkij, 2017-04-12
@melkij

You can speed it up with such a nice recursive cte: https://wiki.postgresql.org/wiki/Loose_indexscan
Natively pg does not yet know how to loose indexscan. So distinct reads all elements of the tree instead of looking for the next larger element.

S

Sergey Gornostaev, 2017-04-12
@sergey-gornostaev

Use grouping by field that requires uniqueness
In my case, this is about 6.5 times faster.