Answer the question
In order to leave comments, you need to log in
How to force Postgres to calculate values in columns after sorting?
I have a large query with many calculated columns in SELECT
. This query also has a sort by one of the calculated columns and a limit on getting the first 100 rows. So, for some reason, Postgres calculates column values for all rows, and not just for the first 100. I understand that it must calculate for all rows the column by which sorting is done - this is logical. But why calculate all the rest?
I'll explain with an example.
Create a test table
And fill it with random dataCREATE TABLE test_main(col1 INTEGER);
DO
$do$
BEGIN
FOR r IN 1..100000 LOOP
INSERT INTO test_main(col1) VALUES (trunc(random()*1000));
END LOOP;
END
$do$;
CREATE TABLE test_main_agg1(
col1 INTEGER,
val INTEGER
);
CREATE TABLE test_main_agg2(
col1 INTEGER,
val INTEGER
);
DO
$do$
DECLARE
r test_main%rowtype;
BEGIN
FOR r IN SELECT * FROM test_main LOOP
FOR i IN 1..5 LOOP
INSERT INTO test_main_agg1(col1, val) VALUES (r.col1, trunc(random()*1000));
INSERT INTO test_main_agg2(col1, val) VALUES (r.col1, trunc(random()*1000));
END LOOP;
END LOOP;
END
$do$;
CREATE INDEX test_main_indx ON test_main(col1);
CREATE INDEX test_main_agg1_val_indx ON test_main_agg1(col1,val);
CREATE INDEX test_main_agg2_val_indx ON test_main_agg2(col1,val);
SELECT col1,
(SELECT MAX(val) FROM test_main_agg1 g WHERE g.col1=m.col1) max_val1,
(SELECT MAX(val) FROM test_main_agg2 g WHERE g.col1=m.col1) max_val2
FROM test_main m
LIMIT 100;
ORDER BY col1
, then it will still be executed quickly. However, if we add ORDER BY max_val1
, then the execution will take about two seconds. I looked in and saw these lines:EXPLAIN ANALYZE
SubPlan 4
-> Result (cost=4.06..4.07 rows=1 width=0) (actual time=0.011..0.011 rows=1 loops=100000)
InitPlan 3 (returns $3)
-> Limit (cost=0.42..4.06 rows=1 width=4) (actual time=0.010..0.010 rows=1 loops=100000)
-> Index Only Scan Backward using test_main_agg2_val_indx on test_main_agg2 g_1 (cost=0.42..1818.25 rows=500 width=4) (actual time=0.010..0.010 rows=1 loops=100000)
Index Cond: ((col1 = m.col1) AND (val IS NOT NULL))
Heap Fetches: 100000
max_val2
for all 100,000 rows, not for the first 100. Why? After all, sorting is by field max_val1
, and for 100,000 rows it is enough to calculate only it? ORDER BY
and LIMIT
?
Answer the question
In order to leave comments, you need to log in
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question