W
W
WellWisher2015-11-19 12:44:32
PostgreSQL
WellWisher, 2015-11-19 12:44:32

How to optimize query in postgresql with filtering and sorting?

There is a base table (conditionally)
entities (id serial, value dp, date timestamp);
I would like to optimize queries of the form as much as possible:

SELECT * FROM entities WHERE value = __VALUE__ ORDER BY date ASC NULLS LAST

Where __VALUE__ is some dp number that is represented in the table.
Tried various options:
  1. adding an index on value
  2. adding index on date
  3. adding multi-column indexes (value, date) and (date, value)

In my opinion, the only working index of the four:
- index on the value field
Actually, the question is, is it possible to achieve better performance and why the index (value, date) does not work?
Those. now only data sampling is optimized, not sorting.
Data details:
  • Records in the table - one billion
  • All keys are regular btree
  • The filter to a specific __VALUE__ value returns about 250 thousand records (i.e. the index is efficient).
  • Indexes were created with appropriate sortings

System Information:
  • postgresql 9.3 or (9.4)
  • default settings
  • I played with the memory parameters for sorting, but I could not achieve the use of the index

Yes, you can modify the query, and add something like LIMIT at the end, then the index (date, value) helps, but it becomes ineffective, then in addition to LIMIT, OFFSET appears in the same query

Answer the question

In order to leave comments, you need to log in

3 answer(s)
I
igruschkafox, 2015-11-19
@igruschkafox

I would try the following option
, if possible, then sort the table by the clustered index by - date ASC
But you need to be careful not to slow down other queries (which may have DESC sort by date)
and create an index by the value field
the search then occurs by the "value" field and not by the combination of fields
the "date" field - is used only for sorting the result, not for selection --- Therefore, the composite index does not work

S
Swartalf, 2015-11-19
@Swartalf

> Records in the table - one billion
And we use partitioning? if not, then I advise www.postgresql.org/docs/9.1/static/ddl-partitionin...

K
Kirill, 2015-11-19
@kshvakov

In general, this whole thing is highly dependent on selectivity, if there is a lot of data by value, postgres can decide that it is more efficient to "sort like this", in the general case, the index will be used
because "default settings" do analyze for entities yourself, because most likely, postgres is not aware of how you have the data on the table distributed there and "pouring fullscan on everything"
ps:

create table entities (id serial, value int, date timestamp);

create index idx_order on entities (value, date asc nulls last);


insert into entities (value, date)
select 
generate_series(1, 20),
time
from generate_series(date_trunc('day', current_timestamp - '10 day'::interval), date_trunc('day', current_timestamp ), '1 second') as time;


analyze entities;


explain SELECT * FROM entities WHERE value = 10 ORDER BY date ASC NULLS LAST


Sort  (cost=239331.74..241545.03 rows=885318 width=16)
  Sort Key: date
  ->  Bitmap Heap Scan on entities  (cost=32277.78..136750.25 rows=885318 width=16)
        Recheck Cond: (value = 10)
        ->  Bitmap Index Scan on idx_order  (cost=0.00..32056.45 rows=885318 width=0)
              Index Cond: (value = 10)


truncate entities;

insert into entities (value, date)
select 
generate_series(1, 200),
time
from generate_series(date_trunc('day', current_timestamp - '1 day'::interval), date_trunc('day', current_timestamp ), '1 second') as time;

analyze entities;


explain SELECT * FROM entities WHERE value = 10 ORDER BY date ASC NULLS LAST;

Sort  (cost=106726.86..106942.86 rows=86401 width=16)
  Sort Key: date
  ->  Bitmap Heap Scan on entities  (cost=3202.17..99642.51 rows=86401 width=16)
        Recheck Cond: (value = 10)
        ->  Bitmap Index Scan on idx_order  (cost=0.00..3180.57 rows=86401 width=0)
              Index Cond: (value = 10)



explain analyze SELECT * FROM entities WHERE value = 10 ORDER BY date ASC NULLS LAST;

Sort  (cost=106726.86..106942.86 rows=86401 width=16) (actual time=18774.148..18783.923 rows=86401 loops=1)
  Sort Key: date
  Sort Method: external sort  Disk: 2200kB
  ->  Bitmap Heap Scan on entities  (cost=3202.17..99642.51 rows=86401 width=16) (actual time=2618.908..18681.170 rows=86401 loops=1)
        Recheck Cond: (value = 10)
        Rows Removed by Index Recheck: 9708392
        Heap Blocks: exact=33638 lossy=52763
        ->  Bitmap Index Scan on idx_order  (cost=0.00..3180.57 rows=86401 width=0) (actual time=2611.946..2611.946 rows=86401 loops=1)
              Index Cond: (value = 10)
Planning time: 0.093 ms
Execution time: 18788.452 ms


set work_mem='1GB';

explain analyze SELECT * FROM entities WHERE value = 10 ORDER BY date ASC NULLS LAST;

Sort  (cost=106726.86..106942.86 rows=86401 width=16) (actual time=614.335..619.024 rows=86401 loops=1)
  Sort Key: date
  Sort Method: quicksort  Memory: 7123kB
  ->  Bitmap Heap Scan on entities  (cost=3202.17..99642.51 rows=86401 width=16) (actual time=198.700..595.232 rows=86401 loops=1)
        Recheck Cond: (value = 10)
        Heap Blocks: exact=86401
        ->  Bitmap Index Scan on idx_order  (cost=0.00..3180.57 rows=86401 width=0) (actual time=177.392..177.392 rows=86401 loops=1)
              Index Cond: (value = 10)
Planning time: 0.145 ms
Execution time: 623.327 ms

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question