How to optimize sql query (selection from 1.5M rows)?

C

cinic2016-07-29 17:11:53

PostgreSQL

cinic, 2016-07-29 17:11:53

I select data between two dates with more than a million rows with this query:

SELECT
  time AT TIME ZONE 'UTC' AT TIME ZONE 'MSK' AS time,
  type,
  CASE
    WHEN (duration > (extract(epoch from (time - '2015-08-29 12:36:50'))) )
    THEN extract(epoch from (time - '2015-08-29 12:36:50'))
    ELSE duration
  END AS trim_duration
FROM clamps
WHERE device_id = 27 AND time BETWEEN '2015-08-29 12:36:50' AND '2016-03-15 12:36:50'
ORDER BY time ASC

Table structure:

CREATE TABLE clamps
(
  id serial NOT NULL,
  "time" timestamp without time zone,
  duration numeric,
  type character varying(255),
  device_id integer,
  packet_id integer,
  dur_float double precision DEFAULT 0,
  created_at timestamp without time zone DEFAULT now(),
  updated_at timestamp without time zone DEFAULT now(),
  CONSTRAINT clamps_pkey PRIMARY KEY (id)
)

There is an index:

CREATE UNIQUE INDEX index_clamps_on_device_id_time
ON clamps
USING btree (device_id, "time");

Explain Analyze:

Sort  (cost=149213.28..150770.01 rows=622695 width=21) (actual time=4263.385..4713.566 rows=1469514 loops=1)
  Sort Key: "time"
  Sort Method: external merge  Disk: 71816kB
  ->  Bitmap Heap Scan on clamps  (cost=17535.79..89284.59 rows=622695 width=21) (actual time=190.593..2692.161 rows=1469514 loops=1)
        Recheck Cond: ((device_id = 27) AND ("time" >= '2015-08-29 12:36:50'::timestamp without time zone) AND ("time" <= '2016-03-15 12:36:50'::timestamp without time zone))
        Heap Blocks: exact=17528
        ->  Bitmap Index Scan on index_clamps_on_device_id_time  (cost=0.00..17380.12 rows=622695 width=0) (actual time=186.667..186.667 rows=1469514 loops=1)
              Index Cond: ((device_id = 27) AND ("time" >= '2015-08-29 12:36:50'::timestamp without time zone) AND ("time" <= '2016-03-15 12:36:50'::timestamp without time zone))
Planning time: 0.291 ms
Execution time: 5414.691 ms

The request is executed in 9-13 seconds (although explain shows 5.5). Any thoughts on how to optimize it?
Thank you.

Reply

Answer the question

In order to leave comments, you need to log in

4 answer(s)

P

Pavel, 2016-07-29
@Ermako

When analyzing the result, the result is not transmitted to the client, the time of IO and transmission over the network is not taken into account (and you have 1.5 lines there, as you yourself said, besides, there can be relatively long lines).
And what do you want to achieve by sorting a huge array? For further analysis, sorting is usually not needed, and for visual analysis, a smaller sample is needed. Remove sorting - save a lot of time. This is a very expensive operation. (Although if you set ORDER BY device_id, time then sorting may go away due to the fact that the index itself stores values in a sorted form)
And this is

CASE
    WHEN (duration > (extract(epoch from (time - '2015-08-29 12:36:50'))) )
    THEN extract(epoch from (time - '2015-08-29 12:36:50'))
    ELSE duration
  END AS trim_duration

Replaced by
So a little clearer and shorter, although the essence and cost does not change much.

L

lega, 2016-07-29
@lega

The index is correct, but it looks like it's not being used, try to hardcode the index in the query, you can also try reindexing.
Is there enough memory for the index?

M

mamkaololosha, 2016-07-29
@mamkaololosha

WHEN (duration > (extract(epoch from (time - '2015-08-29 12:36:50'))) )
    THEN extract(epoch from (time - '2015-08-29 12:36:50'))

Дублирование кода? Не? Плюс там строки конвертятся каждый раз. Еще ORDER_BY попробуйте убрать.
Если даты константные. То может их лучше представить в более цифровом виде. Точно не помню. SQL это компилируемый язык сейчас, или еще интерпретируемый.

S

shagguboy, 2016-07-29
@shagguboy

сделать индекс (device_id , time )