Why SQL query is executed at different speed?

B

beduin012021-07-14 12:33:23

PostgreSQL

beduin01, 2021-07-14 12:33:23

There is a request:

SELECT COUNT(*) FROM xml_files t1

    WHERE NOT EXISTS (SELECT 1
              FROM xml_files t2
              WHERE t1."id" = t2."id" AND
                t1."mydate" < t2."mydate"
            )
    AND t1.insert_status IS NULL
   AND t1.section_name='users' AND t1.region IN ('Moscow') AND t1.xml_date >= to_date('2016', 'YYYY') AND t1.parsing_status IS NULL

When I execute this query in SQL, then 10 times it can be executed for 90 seconds, and for 11 times it can be completed in 5 and then continue to be executed in 5.

I don’t understand what is the reason for the long execution at the beginning. I would understand if after the first time, something was cached, but for some reason the 1st time is not enough.

Are there any optimizers in the database or other tricky logic? Just after an hour, history repeats itself and the request starts to run slowly again.

RAM 16GB.
There is no other load. the search for the reason why the microsevris receiving COUNT freezes led precisely to the fact that periodically COUNT starts to be processed for too long.

Database size 100GB
shared_buffers = 2048MB
maintenance_work_mem = 256MB

PostgreSQL 13

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

G

galaxy, 2021-07-14
@galaxy

When I execute this query in SQL, then 10 times it can be executed for 90 seconds, and for 11 times it can be completed in 5 and then continue to be executed in 5.
...
Are there any optimizers in the database or other tricky logic? Just after an hour, history repeats itself and the request starts to run slowly again.

Well, this picture is amazing. Almost any cache works on the principle of LRU (least recently used) data displacement. You repeat the request - your data is crowding out what was in the cache. You terminate - other requests crowd out your data.
Further, judging by the plan, you should do ANALYZE on the tables (and maybe increase the statistics - ALTER TABLE SET STATISTICS), because the actual number of rows and the estimate diverge quite strongly.
And, finally, self-join'a can be omitted here. We need essentially the latest document by date, so, adjusted for making requests in the mind:

SELECT * FROM (
     SELECT row_number() OVER (PARTITION BY id ORDER BY mydate DESC) as rn,
          t1.* FROM xml_files t1
         AND t1.insert_status IS NULL
        AND t1.section_name='users' AND t1.region IN ('Moscow') AND t1.xml_date >= to_date('2016', 'YYYY') AND t1.parsing_status IS NULL
     ) t
 WHERE rn = 1